Modulation of gene expression using localization domains

ABSTRACT

Methods and compositions for regulating gene expression are provided. In particular, methods and compositions comprising localization domains, and fusions of localization domains with DNA binding domains and, optionally regulatory domains, are provided.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a divisional application of U.S. patent applicationSer. No. 09/967,869 filed Sep. 28, 2001, now allowed, which is relatedto U.S. Provisional Patent Application Ser. No. 60/236,884, filed 29Sep. 2000, from which priority is claimed under 35 USC §119(e)(1), andwhich applications are incorporated herein by reference in theirentireties.

TECHNICAL FIELD

The present disclosure is in the field of gene regulation, specifically,using compositions containing localization domain polypeptides, orfunctional fragments thereof, to modulate gene expression.

BACKGROUND

The development of an organism and ultimate function of any given cellin that organism depends on the particular set of genes being expressed(e.g., transcribed and translated) in the cell. Since virtually all thegenes in the human genome have now been sequenced, the challenge now isto understand the molecular mechanisms that allow these genes to beselectively expressed.

In vertebrates, DNA methylation of CpG dinucleotides has long beenidentified as an important mechanism of development. DNA methylation isrequired for normal development (Ohki et al (1999) EMBO J 18:6653-6661;Okano et al. (1999) Cell 99:247-257); is correlated with genomicimprinting (Ashburner (1972) Results Probl Cell Differ 4:101-151;Grunstein et al. (1997) Nature 389:349-352) and is involved inX-chromosome inactivation (Heard et al. (1997) Annual Rev Genet31:571-610). A large body of evidence indicates that cytosinemethylation leads to the assembly of a specialized, heritable,repressive chromatin architecture through the recruitment of histonedeacetylases (Bird and Wolffe (1999) Cell 99:451-454; Siegfried et al.(1997) Curr Biol 7:R305-307). However, the precise role of DNAmethylation in tissue specific regulation of non-imprinted genes remainscontentious (Bird (1997) Trends Genet 13:469-472).

Thus, DNA methylation appears to be critical in vertebrate development,which relies upon the imposition of progressively more stable states oftranscriptional repression (Steinbach et al. (1997) Nature 389:395-399;Mannervik et al. (1999) Science 284:606-609). Further, DNA methylationmay play a role in partitioning the genome, and the chromosomalinfrastructure within which it is packaged, into active and inactiveintranuclear compartments (Bird et al. (1995) Trend Genet. 11:94-99).For example, mouse primordial germ cells, embryonic stem cells and thecells of the blastocyst can progress through the cell cycle and dividewithout detectable DNA methylation (Lei et al. (1996) Development122:3195-3205). Once differentiation begins, however, DNA methylationbecomes essential for individual cell viability (Li et al. (1992) Cell69:915-926; Okano et al. (1999) Cell 99:247-257).

DNA methylation has also been implicated in clinical disease states.Parasitic DNA, e.g., retrotransposons, retrovirus genomes, lentivirusgenomes, L1 elements and Alu elements are known to be CpG rich. It hasbeen proposed that DNA methylation may have arisen as a genome-defensesystem to silence expression of these parasitic elements and limit theirspread through the genome (Yoder et al. (1997) Trend Genet. 13:335-340;Colot et al. (1999) BioEssays 21:402-411). Additionally, several geneticdiseases have been described that cause methylation defects, includingthe ICF syndrome (Xu et al. (1999) Nature 402:187-189), Rett syndrome(Amir et al. (1999) Nature Genet. 23:185-188) and fragile X syndrome(Oberle et al. (1991) Science 252:1711-1714).

Cellular DNA methylation patterns seem to be established by a complexinterplay of at least three independent DNA methyltransferases: DNMT1,DNMT3A and DNMT3B (Kaludov and Wolffe (2000) Nuc Acids Res 28:1921-1928,and references cited therein). Methyltransferases are required for denovo methylation that occurs in the genome following embryo implantationand for the de novo methylation of newly integrated retroviral sequencesin mouse ES cells (Okano et al. (1999) Cell 99:247-257). Proteins havingsignificant homology to vertebrate methyltransferases been identified inzebrafish, Arabidopsis thaliana and maize (Okano et al. (1998) NatureGenet 19:219-220; Cao et al. (2000) PNAS USA 97:4979-4984).

In addition to the methyltransferases, a group of proteins which bind tomethylated CpG sequences have also been identified. Themethyl-CpG-binding protein MECP2 has been most characterized. MECP2 hasbeen shown to selectively reocgnize methylated DNA and to represstranscription in methylated regions of the genome (Lewis et al. (1992)Cell 69:905-914). MECP2 contains at least two domains: themethyl-CpG-binding domain (MBD), which recognizes symmetricallymethylated CpG dinucleotides through contacts in the major groove of thedouble helix (Wakefield et al. (1999) J. Mol. Biol. 291:1055-1065) and atranscriptional repression domain (TRD), which interacts with severalother regulatory proteins (Nan et al. (1997) Cell 88:471-481. MECP2selectively represses transcription of methylated templates in theabsence of an organized chromatin structure and, when tethered to aspecific heterologous Gal4-binding domain, its TRD conferstranscriptional repression by interacting with TFIIB, a component of thebasal transcription machinery (Kaludov et and Wolffe, (2000) NucleicAcids Res. 28:1921-1928). Methyl binding domain proteins associate withcorepressor complexes that include histone deacetylases. Methyl CpGbinding proteins have also been shown to be components ofchromatin-remodeling complexes, for example the MECP2 repressor complex.Recruitment of a histone deacetylase occurs indirectly through itsinteraction with the Sin3A adaptor proteins, which causestranscriptional silencing, in part by deacetylation of histones,directing the formation of stable repressive chromatin structures.

Thus, methylation of DNA can repress transcription through multiplemechanisms (see, e.g., Kaludov and Wolffe (2000) Nuc Acids Res28:1921-1928, and references cited therein). Pathways of repressioninclude direct inhibition of transcription through the failure oftranscription factors to associate with methylated recognition elements(Iguchi-Arigan et al. (1989) Genes Dev. 3:612-619) and indirect pathwaysinvolving either occlusion of methylated sequences by transcriptionalrepressors that recognize methylated DNA (Meehan et al. (1992) NucleicAcids Res. 20:5085-5092) or the modification of chromatin structuretargeted by methyl-CpG-specific transcriptional repressors (Buschhausenet al. (1987) PNAS USA 84:1177-1181; Kass et al. (1997) Curr. Biol.7:157-165).

Despite the characterization of the functional properties ofmethyl-CpG-specific binding proteins and their constituent MBDs, it hasnot heretofore been possible to target the various functional activitiesof MBDs, for use in specific and directed modulation of gene expression.

SUMMARY

In one aspect, methods of compartmentalizing a region of interest incellular chromatin are provided. The methods comprise contacting theregion of interest with a composition that binds to a binding site incellular chromatin, wherein the binding site is in a gene of interestand wherein the composition comprises a localization domain orfunctional fragment thereof, and a DNA binding domain or functionalfragment thereof. In certain embodiments, the composition is a fusionmolecule, for example a fusion polypeptide. In other embodiments, theregion of interest is compartmentalized into a nuclear compartment forpackaging as heterochromatin. The methods are useful in a variety ofcells, including but not limited to, plant cells and animal cells (e.g.,human). The localization domain can be a methyl CpG binding domainobtained, for example, from MECP2, MBD1, MBD2, MBD3, dMBD-like anddMBD-likeΔ, or one or more functional fragments thereof. The DNA-bindingdomain can be, for example, a zinc finger protein or a triplex-formingnucleic acid or a minor groove binder. In certain embodiments, any ofthe methods described herein facilitate modulation of expression of agene associated with the region of interest, for example repression ofthe gene. In other embodiments, the methods described herein furthercomprise the step of contacting a cell with a polynucleotide encoding afusion polypeptide, wherein the fusion polypeptide is expressed in thecell. The gene can encode any product, for example, vascular endothelialgrowth factor, erythropoietin, androgen receptor, PPAR-γ2, p16, p53,pRb, dystrophin and e-cadherin. Furthermore, in other embodiments, theregion of interest is involved in disease states selected from the groupconsisting of ICF syndrome, Rett syndrome and Fragile X syndrome.

In another aspect, methods are provided for modulation of geneexpression, wherein the methods comprise the step of contacting a regionof DNA in cellular chromatin with a fusion molecule that binds to abinding site in cellular chromatin, wherein the binding site is in thegene and wherein the fusion molecule comprises a DNA binding domain anda localization domain, for example, a methyl CpG binding domain.Modulation of the gene can be, for example, repression of the targetgene. The DNA-binding domain of the fusion molecule can be, for example,a zinc finger DNA-binding domain. Further, the DNA binding domain canbind to a variety of target sites, for example to a target site in agene encoding a product selected from the group consisting of vascularendothelial growth factor, erythropoietin, androgen receptor, PPAR-γ2,p16, p53, pRb, dystrophin and e-cadherin. The localization domain can bea methyl CpG binding domain obtained from, for example, MECP2, MBD1,MBD2, MBD3, dMBD-like and dMBD-likeΔ, or one or more functionalfragments thereof. In still further embodiments, the methods involvecontacting cellular chromatin with a plurality of fusion molecules.

In other aspects, methods of modulating gene expression are provided,wherein the methods comprise the step of contacting a region of DNA incellular chromatin with a fusion molecule that binds to a binding sitein cellular chromatin, wherein the binding site is in the gene andwherein the fusion molecule comprises a DNA binding domain, alocalization domain such as, for example, a methyl CpG binding domainand a regulatory domain (such fusion molecules can include functionalfragments of any of these domains). Modulation of gene expression canbe, for example, repression (e.g., using a repression domain orfunctional fragment thereof as the transcriptional regulatory domain) oractivation (e.g, using an activation domain, such as for example VP16,or a functional fragment thereof, as the transcriptional regulatorydomain). The regulatory domain can also comprise a component of achromatin remodeling complex (or a functional fragment thereof) with thecapacity to recruit complexes capable of remodeling chromatin of thetarget gene into either a transcriptionally active or atranscriptionally inactive state, as desired. The DNA-binding domain ofthe fusion molecule can comprise a zinc finger DNA-binding domain.Further, the DNA binding domain can bind to any target site, for examplea target site in a gene encoding a product selected from the groupconsisting of vascular endothelial growth factor, erythropoietin,androgen receptor, PPAR-γ2, p16, p53, pRb, dystrophin and e-cadherin.The localization domain can be a methyl CpG binding domain obtained, forexample, from MECP2, MBD1, MBD2, MBD3, dMBD-like and dMBD-likeΔ, or oneor more functional fragments thereof. In still further embodiments, aplurality of fusion molecules is contacted with cellular chromatin,wherein each of the fusion molecules binds to a distinct binding site,for example, to modulate expression of one or more genes.

In yet another aspect, a fusion polypeptides comprising a localizationdomain or functional fragment thereof; and a DNA binding domain or afunctional fragment thereof is provided. In certain embodiments, thefusion polypeptide also comprises a regulatory domain, for example anactivation domain (e.g., VP-16, p65), a repression domain (e.g., KRAB,v-erbA) or a component of a chromatin remodeling complex. Any of thepolypeptides described herein can include a DNA-binding domain which isa zinc finger DNA binding domain and a localization domain which can be,for example, a methyl CpG binding domain such as obtained, for example,from MECP2, MBD1, MBD2, MBD3, dMBD-like and dMBD-likeΔ or functionalfragments thereof. These fusion polypeptides can bind, for example, to atarget site in a gene encoding a product selected from the groupconsisting of vascular endothelial growth factor, erythropoietin,androgen receptor, PPAR-γ2, p16, p53, pRb, dystrophin and e-cadherin.Polynucleotides encoding any of the fusion polypeptides described hereinare also provided, as are cells comprising the polypeptides and/orpolynucleotides encoding the polypeptides.

These and other embodiments will be readily apparent to one of skill inthe art upon reading the present disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

FIGS. 1A and 1B, are sequence alignments depicting that Drosophilacontains multiple proteins with significant similarity to vertebratemethyl CpG binding proteins.

FIG. 1A depicts the similarity of Drosophila proteins to the methyl CpGbinding domain motif. The amino acid sequences corresponding to themethyl CpG binding motif of human MeCP2, human MBD1, human MBD4, humanMBD2, and Xenopus MBD3 (xMBD3) are aligned with the correspondingsequences from the indicated Drosophila gene products. A 23 amino acidsegment from the Drosophila MBD-like sequence (NNNASSNNNSSATASSNNNNNKV,SEQ ID NO: 1) has been omitted in the loop L1 to facilitate thealignment. Positions of beta strands, loops, and the alpha helix definedby the solution structures of MeCP2 and MBD1 are indicated above thealignment. Residues boxed in the alignment are identical or similar inall or all but one the sequences depicted. Residues indicated by thesymbol ψ define hydrophobic residues crucial for the basic fold of themotif. Residues indicated by the squares constitute the basic patch onone surface of the wedge structure. The two residues indicated by thediamond symbols are conserved hydrophobic residues critical for thestructure of the hairpin loop.

FIG. 1B depicts the similarity of dMBD-like to Xenopus MBD3. The deducedamino acid sequence of Drosophila MBD-like and MBD-likeΔ are alignedwith Xenopus MBD3 and MBD3-LF. Amino acids identical in all the proteinsare shaded in dark gray, amino acids with similar side chain chemistryare shaded light gray and indicated by upward arrows. A box indicatesthe methyl CpG binding domains. The secondary structure of themethyl-CpG binding domain is indicated at the top. Arrows representβ-sheet segments, rectangles represent α-helices. Loops appear as thicklines.

FIG. 1C is an immunoblot depicting that immobilized dMBD-like fails tobind methylated DNA. The bottom two panels depict Southwestern assaysperformed with recombinant X. laevis MBD3, Drosophila MBD-like andMBD-likeΔ. The middle panel (labeled GAC12) is probed with theunmethylated DNA probe. The lower panel (labeled GAM12) is probed withthe methylated probe. The top panel (labeled Coomassie) is a CoomassieBlue stained gel of lanes identical to those in the middle and lowerpanels. Each panel contains triplicate samples of the indicated protein.

FIG. 1D is an immunoblot depicting that dMBD-like fails to bindmethylated DNA in solution. Xenopus MBD3, Drosophila MBD-like andMBD-likeΔ were examined for the ability to bind to methylated (GAM12) orunmethylated (GAC12) DNA probes. Binding reactions were performed asdescribed in Example 1. Lanes 1-5 of each gel contain radiolabelled,unmethylated GAC12 as a probe and lanes 6-10 contain radiolabelled,fully methylated GAM12. For each gel, lanes 1 and 6 contain only theprobe without any added protein. Lanes 2 and 7 contain 50 ng of protein,lanes 3 and 8 contain 75 ng of protein and lanes 4, 5, 9 and 10 contain150 ng of protein. Binding was competed with either GAC12 or GAM12 ascompetitor (U, unmethylated GAC12; M, methylated GAM12) as indicated atthe bottom of the figure.

FIG. 2A is an immunoblot showing that dMBD-likeΔ is the predominant formof dMBD-like protein found in Drosophila S2 cells. The immunoblot wasprepared and analyzed with α-dMBD-like serum as described in Example 1.The lanes were loaded as follows: Lane 1, 5 μl S2 nuclear extract, Lane2, 10 μl nuclear extract, Lane 3, recombinant dMBD-like. Lane 4,recombinant dMBD-likeΔ.

FIG. 2B shows association of dMBD-likeΔ with histone deacetylaseactivity in S2 nuclear extracts. Immunoprecipitations were performed asdescribed in Example 1 on S2 nuclear extracts using the α-dMBD-likeantiserum or pre-immune serum from the same rabbit. Precipitates wereanalyzed for HDAC activity using the deacetylase assay described inExample 1. Acetate released is indicated in the bar graph as cpmtritium. Samples are as follows: 1, no antiserum control; 2, pre-immuneserum; 3 α-dMBD-like serum. Precipitations were performed multipletimes; a representative example is depicted.

FIG. 2C shows association of dMBD-like with nucleosome-stimulated ATPaseactivity. Immunoprecipitations were performed on S2 nuclear extracts asdescribed in FIG. 2B. Precipitated proteins were analyzed for ATPaseactivity as described in Example 1. The bar graph depicts inorganicphosphate produced in arbitrary units. Samples are as follows: 1, noantiserum control; 2, pre-immune serum; 3, α-dMBD-like serum. Light anddark bars correspond respectively to the absence and the presence ofchicken erythrocyte mononucleosomes during the ATPase assay.

FIG. 3A is a schematic depicting partial resolution of dMBD-likeΔ fromSIN3 and RPD3 by ion exchange chromatography. S2 nuclear extract wasfractionated according to the scheme depicted in FIG. 3A and describedin detail in Example 1. HDAC activity assays and immunoblot analysis ofthe indicated fractions from the MonoQ column are shown below the flowchart. FIG. 3B is an immunoblot depicting coelution of dMBD-likeΔ withcomponents of the Mi-2 complex on a gel filtration column. Fraction 24from the MonoQ column was resolved on a Superose 6 gel filtration columnas described in Example 1. Indicated fractions were analyzed byimmunoblot using the antisera are indicated.

FIG. 4A shows schematic depictions of the plasmids used for thetransfection assays. A description of plasmid construction is presentedin Example 1.

FIG. 4B depicts transcriptional repression as a function of dose ofGal4-tethered dMBD-like, dMBD-likeΔ, and Groucho. Experiments wereperformed in triplicate and error bars are shown.

FIG. 4C is an immunoblot showing that expression of the transientlytransfected Gal4 derivatives is equivalent. Extracts from cellstransfected with each of the indicated constructs were analyzed byimmunoblot using either α-dMBD-like or α-Gal4 antisera.

FIG. 4D shows that TSA relieves repression by Gal4-Gro, Gal4-dMBD-likeand Gal4-dMBD-likeΔ. The graph depicts luciferase activity from theG₅DE₅tkLuc reporter driven by the indicated Gal4 derivatives as apercentage of luciferase expression from the same reporter in theabsence of any transfected Gal4 protein. All the samples are the averageof triplicates. TSA was used at 100 nM and 400 nM as indicated in thefigure.

FIGS. 5A and 5B show regulation dMBD-like and dMBD-likeΔ mRNA andprotein expression during development.

FIG. 5A is a Northern (RNA) blot showing dMBD-like and dMBD-likeΔexpression through development. Total RNA isolated from variousdevelopmental stages (˜10 μg/lane) was fractionated on aformaldehyde-agarose gel and transferred to a nylon membrane asdescribed in Example 1. Lanes 1-3, embryonic stages: 0-3 h, 3-12 h,12-24 h; lanes 4-6, larval stages, 1^(st), 2^(nd) and 3^(rd) Instars.Lane 7 male adult flies, lane 8, female adult flies.

FIG. 5B is an immunoblot showing dMBD-like and dMBD-likeΔ levels duringdevelopment. Lanes 1-8 correspond to the same samples as in panel A.Equivalent amounts of protein were loaded in each lane.

DETAILED DESCRIPTION

Disclosed herein are compositions containing localization domains andmethods for their preparation and use. The methods and compositionsallow, for example, localization of corepression complexes either (1) tofacilitate their recruitment to particular sites within chromatin byfusion of a localization domian to a DNA binding domain that can accesssuch a site to repress gene activity or (2) to interfere withcorepressive function, for example by attaching an activation domain toa DNA binding domain-localization domain fusion to affect repressiveinfluences and promote gene activation.

In a preferred embodiment, a localization domain is a methyl bindingdomain (MBD) or a functional fragment thereof. Vertebrate methyl bindingdomain proteins are known to recognize and bind to CpG dinucleotidesequences in which the C residue is methylated. However, a surprisingand unexpected ability of MBDs (including invertebrate MBDs which do notbind to methylated DNA) is their capacity to localize DNA, for examplein corepression complexes. Thus, the methods and compositions disclosedherein allow for modulation of gene expression by employing acomposition comprising a localization domain polypeptide or functionalfragment thereof. The localization domain polypeptides can be selectedfor their ability to affect transcription, for example via theircapacity to interact with corepression complexes and/or facilitatecompartmentalization of target sequences in repressive compartments ofthe nucleus.

In one aspect, compositions and methods useful in modulating expressionof a target gene are provided. The compositions typically comprise afusion molecule comprising a localization domain and a DNA-bindingdomain. In one preferred embodiment, the localization domain comprises aMBD (or functional fragment thereof) and the DNA binding domaincomprises a zinc finger protein (ZFP) or functional fragment thereof. Instill further aspects, the compositions further comprise atranscriptional regulatory domain (a “functional domain”), for examplean activation or repression domain.

Thus, it will be apparent to one of skill in the art that the use oflocalization domain(s) or functional fragments thereof will facilitatethe regulation of many processes involving gene expression including,but not limited to, replication, recombination, repair, transcription,telomere function and maintenance, sister chromatid cohesion, mitoticchromosome segregation and, in addition, binding of transcriptionfactors.

General

The practice of the disclosed methods, and the uses of the disclosedcompositions, employ, unless otherwise indicated, conventionaltechniques in molecular biology, biochemistry, chromatin structure andanalysis, computational chemistry, cell culture, recombinant DNA andrelated fields as are within the skill of the art. These techniques arefully explained in the literature. See, for example, Sambrook et al.MOLECULAR CLONING: A LABORATORY MANUAL, Second edition, Cold SpringHarbor Laboratory Press, 1989; Sambrook et al. MOLECULAR CLONING: ALABORATORY MANUAL, Third edition, Cold Spring Harbor Laboratory Press,2001; Ausubel et al., CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, John Wiley& Sons, New York, 1987 and periodic updates; the series METHODS INENZYMOLOGY, Academic Press, San Diego; Wolffe, CHROMATIN STRUCTURE ANDFUNCTION, Third edition, Academic Press, San Diego, 1998; METHODS INENZYMOLOGY, Vol. 304, “Chromatin” (P. M. Wassarman and A. P. Wolffe,eds.), Academic Press, San Diego, 1999; and METHODS IN MOLECULARBIOLOGY, Vol. 119, “Chromatin Protocols” (P. B. Becker, ed.) HumanaPress, Totowa, 1999.

The terms “nucleic acid,” “polynucleotide,” and “oligonucleotide” areused interchangeably and refer to a deoxyribonucleotide orribonucleotide polymer in either single- or double-stranded form. Forthe purposes of the present disclosure, these terms are not to beconstrued as limiting with respect to the length of a polymer. The termscan encompass known analogues of natural nucleotides, as well asnucleotides that are modified in the base, sugar and/or phosphatemoieties. In general, an analogue of a particular nucleotide has thesame base-pairing specificity; i.e., an analogue of A will base-pairwith T. The terms also encompasses nucleic acids containing modifiedbackbone residues or linkages, which are synthetic, naturally occurring,and non-naturally occurring, which have similar binding properties asthe reference nucleic acid, and which are metabolized in a mannersimilar to the reference nucleotides. Examples of such analogs include,without limitation, phosphorothioates, phosphoramidates, methylphosphonates, chiral-methyl phosphonates, 2-O-methyl ribonucleotides,peptide-nucleic acids (PNAs).

Unless otherwise indicated, a particular nucleic acid sequence alsoimplicitly encompasses conservatively modified variants thereof (e.g.,degenerate codon substitutions) and complementary sequences, as well asthe sequence explicitly indicated. Nucleic acids include, for example,genes, cDNAs, and mRNAs. Polynucleotide sequences are displayed hereinin the conventional 5′-3′ orientation.

Chromatin is the nucleoprotein structure comprising the cellular genome.“Cellular chromatin” comprises nucleic acid, primarily DNA, and protein,including histones and non-histone chromosomal proteins. The majority ofeukaryotic cellular chromatin exists in the form of nucleosomes, whereina nucleosome core comprises approximately 150 base pairs of DNAassociated with an octamer comprising two each of histones H2A, H2B, H3and H4; and linker DNA (of variable length depending on the organism)extends between nucleosome cores. A molecule of histone H1 (or itsequivalent) is generally associated with the linker DNA. For thepurposes of the present disclosure, the term “chromatin” is meant toencompass all types of cellular nucleoprotein, both prokaryotic andeukaryotic. Cellular chromatin includes both chromosomal and episomalchromatin, and includes both transcriptionally active chromatin(euchromatin) and transcriptionally inactive chromatin(heterochromatin).

A “chromosome” is a chromatin complex comprising all or a portion of thegenome of a cell. The genome of a cell is often characterized by itskaryotype, which is the collection of all the chromosomes that comprisethe genome of the cell. The genome of a cell can comprise one or morechromosomes.

An “episome” is a replicating nucleic acid, nucleoprotein complex orother structure comprising a nucleic acid that is not part of thechromosomal karyotype of a cell. Examples of episomes include plasmidsand certain viral genomes.

An “exogenous molecule” is a molecule that is not normally present in acell, but can be introduced into a cell by one or more genetic,biochemical or other methods. Normal presence in the cell is determinedwith respect to the particular developmental stage and environmentalconditions of the cell. Thus, for example, a molecule that is presentonly during embryonic development of muscle is an exogenous moleculewith respect to an adult muscle cell. Similarly, a molecule induced byheat shock is an exogenous molecule with respect to a non-heat-shockedcell. An exogenous molecule can comprise, for example, a functioningversion of a malfunctioning endogenous molecule or a malfunctioningversion of a normally-functioning endogenous molecule.

An exogenous molecule can be, among other things, a small molecule, suchas is generated by a combinatorial chemistry process, or a macromoleculesuch as a protein, nucleic acid, carbohydrate, lipid, glycoprotein,lipoprotien, polysaccharide, any modified derivative of the abovemolecules, or any complex comprising one or more of the above molecules.Nucleic acids include DNA and RNA, can be single- or double-stranded;can be linear, branched or circular; and can be of any length. Nucleicacids include those capable of forming duplexes, as well astriplex-forming nucleic acids. See, for example, U.S. Pat. Nos.5,176,996 and 5,422,251. Proteins include, but are not limited to,DNA-binding proteins, transcription factors, chromatin remodelingfactors, methylated DNA binding proteins, polymerases, methylases,demethylases, acetylases, deacetylases, kinases, phosphatases,integrases, recombinases, ligases, topoisomerases, gyrases andhelicases.

An exogenous molecule can be the same type of molecule as an endogenousmolecule, e.g., protein or nucleic acid (i.e., an exogenous gene),providing it has a sequence that is different from an endogenousmolecule. For example, an exogenous nucleic acid can comprise aninfecting viral genome, a plasmid or episome introduced into a cell, ora chromosome that is not normally present in the cell. Methods for theintroduction of exogenous molecules into cells are known to those ofskill in the art and include, but are not limited to, lipid-mediatedtransfer (i.e., liposomes, including neutral and cationic lipids),electroporation, direct injection, cell fusion, particle bombardment,calcium phosphate co-precipitation, DEAE-dextran-mediated transfer andviral vector-mediated transfer.

By contrast, an “endogenous molecule” is one that is normally present ina particular cell at a particular developmental stage under particularenvironmental conditions. For example, an endogenous nucleic acid cancomprise an endogenous gene, a chromosome, the genome of amitochondrion, chloroplast or other organelle, or a naturally-occurringepisomal nucleic acid. Additional endogenous molecules can includeproteins, for example, transcription factors and components of chromatinremodeling complexes.

A “fusion molecule” is a molecule in which two or more subunit moleculesare linked, preferably covalently. The subunit molecules can be the samechemical type of molecule, or can be different chemical types ofmolecules. Examples of the first type of fusion molecule include, butare not limited to, fusion polypeptides (for example, a fusion between aZFP DNA-binding domain and a methyl binding domain) and fusion nucleicacids (for example, a nucleic acid encoding the fusion polypeptidedescribed supra). Examples of the second type of fusion moleculeinclude, but are not limited to, a fusion between a triplex-formingnucleic acid and a polypeptide, and a fusion between a minor groovebinder and a nucleic acid.

A “gene,” for the purposes of the present disclosure, includes a DNAregion encoding a gene product (see infra), as well as all DNA regionswhich regulate the production of the gene product, whether or not suchregulatory sequences are adjacent to coding and/or transcribedsequences. Accordingly, a gene includes, but is not necessarily limitedto, promoter sequences, terminators, translational regulatory sequencessuch as ribosome binding sites and internal ribosome entry sites,enhancers, silencers, insulators, boundary elements, replicationorigins, matrix attachment sites and locus control regions.

“Gene expression” refers to the conversion of the information, containedin a gene, into a gene product. A gene product can be the directtranscriptional product of a gene (e.g., mRNA, tRNA, rRNA, antisenseRNA, ribozyme, structural RNA or any other type of RNA) or a proteinproduced by translation of a mRNA. Gene products also include RNAs whichare modified, by processes such as capping, polyadenylation,methylation, and editing, and proteins modified by, for example,methylation, acetylation, phosphorylation, ubiquitination,ADP-ribosylation, myristilation, and glycosylation.

“Gene activation” and “augmentation of gene expression” refer to anyprocess which results in an increase in production of a gene product. Agene product can be either RNA (including, but not limited to, mRNA,rRNA, tRNA, and structural RNA) or protein. Accordingly, gene activationincludes those processes which increase transcription of a gene and/ortranslation of a mRNA. Examples of gene activation processes whichincrease transcription include, but are not limited to, those whichfacilitate formation of a transcription initiation complex, those whichremodel chromatin into an active state, those which increasetranscription initiation rate, those which increase transcriptionelongation rate, those which increase processivity of transcription andthose which relieve transcriptional repression (by, for example,blocking the binding of a transcriptional repressor). Gene activationcan constitute, for example, inhibition of repression as well asstimulation of expression above an existing level. Examples of geneactivation processes which increase translation include those whichincrease translational initiation, those which increase translationalelongation and those which increase mRNA stability. In general, geneactivation comprises any detectable increase in the production of a geneproduct, preferably an increase in production of a gene product by about2-fold, more preferably from about 2- to about 5-fold or any integertherebetween, more preferably between about 5- and about 10-fold or anyinteger therebetween, more preferably between about 10- and about20-fold or any integer therebetween, still more preferably between about20- and about 50-fold or any integer therebetween, more preferablybetween about 50- and about 100-fold or any integer therebetween, morepreferably 100-fold or more.

“Gene repression” and “inhibition of gene expression” refer to anyprocess which results in a decrease in production of a gene product. Agene product can be either RNA (including, but not limited to, mRNA,rRNA, tRNA, and structural RNA) or protein. Accordingly, gene repressionincludes those processes which decrease transcription of a gene and/ortranslation of a mRNA. Examples of gene repression processes whichdecrease transcription include, but are not limited to, those whichinhibit formation of a transcription initiation complex, those whichremodel chromatin into an inactive state, those which decreasetranscription initiation rate, those which decrease transcriptionelongation rate, those which decrease processivity of transcription andthose which antagonize transcriptional activation (by, for example,blocking the binding of a transcriptional activator). Gene repressioncan constitute, for example, prevention of activation as well asinhibition of expression below an existing level. Examples of generepression processes which decrease translation include those whichdecrease translational initiation, those which decrease translationalelongation and those which decrease mRNA stability. Transcriptionalrepression includes both reversible and irreversible inactivation ofgene transcription. In general, gene repression comprises any detectabledecrease in the production of a gene product, preferably a decrease inproduction of a gene product by about 2-fold, more preferably from about2- to about 5-fold or any integer therebetween, more preferably betweenabout 5- and about 10-fold or any integer therebetween, more preferablybetween about 10- and about 20-fold or any integer therebetween, stillmore preferably between about 20- and about 50-fold or any integertherebetween, more preferably between about 50- and about 100-fold orany integer therebetween, more preferably 100-fold or more. Mostpreferably, gene repression results in complete inhibition of geneexpression, such that no gene product is detectable.

“Eucaryotic cells” include, but are not limited to, fungal cells (suchas yeast), plant cells, insect cells, animal cells, teleost cells,mammalian cells and human cells.

The terms “operable linkage,” “operably linked,” “operative linkage” and“operatively linked” are used with reference to a juxtaposition of twoor more components (such as sequence elements), in which the componentsare placed into a functional relationship with one another. Thusoperatively linked components are arranged such that at least one of thecomponents can mediate a function that is exerted upon at least one ofthe other components. By way of illustration, a transcriptionalregulatory sequence, such as a promoter, is operatively linked to acoding sequence if the transcriptional regulatory sequence controls thelevel of transcription of the coding sequence in response to thepresence or absence of one or more transcriptional regulatory factors.An operatively linked transcriptional regulatory sequence is generallyjoined in cis with a coding sequence, but need not be directly adjacentto it. For example, an enhancer can constitute a transcriptionalregulatory sequence that is operatively-linked to a coding sequence,even though they are not contiguous. Similarly, certain amino acidsequences that are non-contiguous in a primary polypeptide sequence maynonetheless be operably linked due to, for example folding of apolypeptide chain.

With respect to fusion polypeptides, the terms “operably linked” and“operatively linked” can refer to the fact that each of the componentsperforms the same function in linkage to the other component as it wouldif it were not so linked. For example, with respect to a fusionpolypeptide in which a ZFP DNA-binding domain is fused to atranscriptional activation domain (or functional fragment thereof), theZFP DNA-binding domain and the transcriptional activation domain (orfunctional fragment thereof) are in operative linkage if, in the fusionpolypeptide, the ZFP DNA-binding domain portion is able to bind itstarget site and/or its binding site, while the transcriptionalactivation domain (or functional fragment thereof) is able to activatetranscription.

A “functional fragment” of a protein, polypeptide or nucleic acid is aprotein, polypeptide or nucleic acid whose sequence is not identical toits native or full-length counterpart, yet retains the same function asthe native or full-length counterpart. A functional fragment can possessmore, fewer, or the same number of residues as the corresponding nativeor full-length molecule, and/or can contain one ore more amino acid ornucleotide analogues or substitutions. Methods for determining thefunction of a nucleic acid (e.g., coding function, ability to hybridizeto another nucleic acid) are well-known in the art. Similarly, methodsfor determining protein function are well-known. For example, theDNA-binding function of a polypeptide can be determined, for example, byfilter-binding, electrophoretic mobility-shift, or immunoprecipitationassays. See Ausubel et al., supra. The ability of a protein to interactwith another protein can be determined, for example, byco-immunoprecipitation, two-hybrid assays or complementation, bothgenetic and biochemical. See, for example, Fields et al. (1989) Nature340:245-246; U.S. Pat. No. 5,585,245 and PCT WO 98/44350.

The term “recombinant,” when used with reference to a cell, indicatesthat the cell replicates an exogenous nucleic acid, or expresses apeptide or protein encoded by an exogenous nucleic acid. Recombinantcells can contain genes that are not found within the native(non-recombinant) form of the cell. Recombinant cells can also containgenes found in the native form of the cell wherein the genes aremodified and re-introduced into the cell by artificial means. The termalso encompasses cells that contain a nucleic acid endogenous to thecell that has been modified without removing the nucleic acid from thecell; such modifications include those obtained by gene replacement,site-specific mutation, and related techniques. Thus, for example,recombinant cells express genes that are not found within the native(naturally occurring) form of the cell or express a second copy of anative gene that is otherwise normally or abnormally expressed,underexpressed or not expressed at all. Recombinant cells also includecells or cell lines derived from cells that have been modified asdescribed.

The term “recombinant” when used with reference, e.g., to a nucleicacid, protein, or vector, refers to nucleic acids, proteins or vectorsthat have been modified by the introduction of heterologous nucleic acidor amino acid sequence, and includes any other alterations of a nativenucleic acid or protein.

An “expression vector” is a nucleic acid construct, generatedrecombinantly or synthetically, with a series of specified nucleic acidelements that permit transcription of a particular nucleic acid in ahost cell, and optionally integration and/or replication of theexpression vector in a host cell. The expression vector can be part of aplasmid, viral genome, or nucleic acid fragment, of viral or non-viralorigin. Expression vectors can be, for example, naked DNA molecules, orcan comprise nucleic acid of viral or nonviral origin packaged intoviral particles. Typically, the expression vector includes an“expression cassette,” which comprises a nucleic acid to be transcribedoperably linked to control elements that are capable of effectingexpression of a nucleic acid that is operatively linked to the controlelements in hosts compatible with such sequences. Expression cassettesinclude at least promoters and optionally, transcription terminationsignals. Typically, a recombinant expression cassette includes at leasta nucleic acid to be transcribed (e.g., a nucleic acid encoding adesired polypeptide) and a promoter. Additional factors necessary orhelpful in effecting expression can also be used, for example, anexpression cassette can also include nucleotide sequences that encode asignal sequence that directs secretion of an expressed protein from thehost cell. Transcription termination signals, enhancers, and othernucleic acid sequences that influence gene expression can also beincluded in an expression cassette.

The term “naturally occurring,” as applied to an object, means that theobject can be found in nature.

The terms “polypeptide,” “peptide” and “protein” are usedinterchangeably herein to refer to a polymer of amino acid residues. Theterms apply to amino acid polymers in which one or more amino acidresidue is an analog or mimetic of a corresponding naturally occurringamino acid, as well as to naturally occurring amino acid polymers.Polypeptides can be modified, e.g., by phosphorylation, methylation,myristilation, acetylation and/or the addition of carbohydrate residuesto form glycoproteins. The terms “polypeptide,” “peptide” and “protein”include all of these modified polypeptides, as well as polypeptidescomprising any additional covalent or non-covalent modification.Polypeptide sequences are displayed herein in the conventionalN-terminal to C-terminal orientation.

A “subsequence” or “segment” when used in reference to a nucleic acid orpolypeptide refers to a sequence of nucleotides or amino acids thatcomprise a part of a longer sequence of nucleotides or amino acids(e.g., a polypeptide), respectively.

The term “antibody” as used herein includes antibodies obtained fromboth polyclonal and monoclonal preparations, as well as, the following:(i) hybrid (chimeric) antibody molecules (see, for example, Winter etal. (1991) Nature 349:293-299; and U.S. Pat. No. 4,816,567); (ii)F(ab′)2 and F(ab) fragments; (iii) Fv molecules (noncovalentheterodimers, see, for example, Inbar et al. (1972) Proc. Natl. Acad.Sci. USA 69:2659-2662; and Ehrlich et al. (1980) Biochem 19:4091-4096);(iv) single-chain Fv molecules (sFv) (see, for example, Huston et al.(1988) Proc. Natl. Acad. Sci. USA 85:5879-5883); (v) dimeric andtrimeric antibody fragment constructs; (vi) humanized antibody molecules(see, for example, Riechmann et al. (1988) Nature 332:323-327; Verhoeyanet al. (1988) Science 239:1534-1536; and U.K. Patent Publication No. GB2,276,169, published 21 Sep. 1994); (vii) Mini-antibodies or minibodies(i.e., sFv polypeptide chains that include oligomerization domains attheir C-termini, separated from the sFv by a hinge region; see, e.g.,Pack et al. (1992) Biochem 31:1579-1584; Cumber et al. (1992) J.Immunology 149B:120-126); and, (vii) any functional fragments obtainedfrom such molecules, wherein such fragments retain specific-bindingproperties of the parent antibody molecule.

“Specific binding” between an antibody or other binding agent and anantigen, or between two binding partners, means that the dissociationconstant for the interaction is less than 10⁻⁶ M. Preferredantibody/antigen or binding partner complexes have a dissociationconstant of less than about 10⁻⁷ M, and preferably 10⁻⁸ M to 10⁻⁹ M or10⁻¹⁰ M or lower.

A “binding protein” “or binding domain” is a protein or polypeptide thatis able to bind non-covalently to another molecule. A binding proteincan bind to, for example, a DNA molecule (a DNA-binding domain), an RNAmolecule (an RNA-binding domain) and/or a protein molecule (aprotein-binding domain). In the case of a protein-binding protein, itcan bind to itself (to form homodimers, homotrimers, etc.) and/or it canbind to one or more molecules of a different protein or proteins. Abinding domain can have more than one type of binding activity. Forexample, zinc finger proteins have DNA-binding, RNA-binding andprotein-binding activity.

A “zinc finger binding protein” is a protein or polypeptide that bindsDNA, RNA and/or protein, preferably in a sequence-specific manner, as aresult of stabilization of protein structure through coordination of azinc ion. The term zinc finger binding protein is often abbreviated aszinc finger protein or ZFP. The individual DNA binding domains aretypically referred to as “fingers” A ZFP has least one finger, typicallytwo fingers, three fingers, or six fingers. Each finger binds from twoto four base pairs of DNA, typically three or four base pairs of DNA. AZFP binds to a nucleic acid sequence called a target site or targetsegment. Each finger typically comprises an approximately 30 amino acid,zinc-chelating, DNA-binding subdomain. An exemplary motif characterizingone class of these proteins (C₂H₂ class) is-Cys-(X)₂₋₄-Cys-(X)₁₂-His-(X)₃₋₅-His (where X is any amino acid).Studies have demonstrated that a single zinc finger of this classconsists of an alpha helix containing the two invariant histidineresidues co-ordinated with zinc along with the two cysteine residues ofa single beta turn (see, e.g., Berg & Shi, Science 271:1081-1085(1996)).

Zinc finger proteins can be engineered to bind to predeterminedsequences. Examples of zinc finger engineering include designed zincfinger proteins and selected zinc finger proteins. A “designed” zincfinger protein is a protein not occurring in nature whose structure andcomposition result principally from rational criteria. Rational criteriafor design include application of substitution rules and computerizedalgorithms for processing information in a database storing informationof existing ZFP designs and binding data, for example as described inPCT WO 98/53058, WO 98/53059, WO 99/53060 and WO 00/42219. A “selected”zinc finger protein is a protein not found in nature whose productionresults primarily from an empirical process such as phage display. Seee.g., U.S. Pat. No. 5,789,538; U.S. Pat. No. 6,007,988; U.S. Pat. No.6,013,453; WO 95/19431; WO 96/06166 WO 98/53057 and WO 98/54311.

A “target site” or “target sequence” is a sequence that is bound by abinding protein such as, for example, a ZFP. Target sequences can benucleotide sequences (either DNA or RNA) or amino acid sequences. Asingle target site typically has about four to about ten base pairs, butcan be as long as 18-20 base pairs, e.g., for a six-finger ZFP.Typically, a two-fingered ZFP recognizes a four to seven base pairtarget site, and a three-fingered ZFP recognizes a six to ten base pairtarget site. By way of example, a DNA target sequence for a three-fingerZFP is generally either 9 or 10 nucleotides in length, depending uponthe presence and/or nature of cross-strand interactions between the ZFPand the target sequence. Target sequences can be found in any DNA or RNAsequence, including regulatory sequences, exons, introns, or anynon-coding sequence.

A “target subsite” or “subsite” is the portion of a DNA target site thatis bound by a single zinc finger, excluding cross-strand interactions.Thus, in the absence of cross-strand interactions, a subsite isgenerally three nucleotides in length. In cases in which a cross-strandinteraction occurs (e.g., a “D-able subsite,” as described for examplein co-owned PCT WO 00/42219, incorporated by reference in its entiretyherein) a subsite is four nucleotides in length and overlaps withanother 3- or 4-nucleotide subsite.

“K_(d)” refers to the dissociation constant for the compound, i.e., theconcentration of a compound (e.g., a zinc finger protein) that giveshalf maximal binding of the compound to its target (i.e., half of thecompound molecules are bound to the target) under given conditions(i.e., when [target] <<K_(d)), as measured using a given assay system(see, e.g., U.S. Pat. No. 5,789,538). Any assay system can be used, aslong is it gives an accurate measurement of the actual K_(d). In oneembodiment, the K_(d) for a ZFP is measured using an electrophoreticmobility shift assay (“EMSA”), as described, for example, in WO00/441566 and WO 00/42219.

“Administering” an expression vector, nucleic acid, ZFP, or a deliveryvehicle to a cell comprises transducing, transfecting, electroporating,translocating, fusing, phagocytosing, shooting or ballistic methods,etc., i.e., any means by which a protein or nucleic acid can betransported across a cell membrane and preferably into the nucleus of acell.

The term “effective amount” includes that amount which results in thedesired result, for example, repression of an active gene, activation ofa repressed gene, or inhibition of transcription of a structural gene ortranslation of RNA.

A “delivery vehicle” refers to a compound, e.g., a liposome, toxin, or amembrane translocation polypeptide, which is used to administer anexogenous molecule. Delivery vehicles can be used, for example, toadminister nucleic acids encoding fusion molecules such as, for exampleZFP-localization domain fusions. Exemplary delivery vehicles includelipid:nucleic acid complexes, expression vectors, viruses, and the like.

The term “modulate” refers to a change in the quantity, degree or extentof a function. For example, the modified zinc finger-nucleotide bindingpolypeptides disclosed herein may modulate the activity of a promotersequence by binding to a motif within the promoter, thereby inducing,enhancing or suppressing transcription of a gene operatively linked tothe promoter sequence. Alternatively, modulation may include inhibitionof transcription of a gene wherein the modified zinc finger-nucleotidebinding polypeptide binds to the structural gene and blocks DNAdependent RNA polymerase from reading through the gene, thus inhibitingtranscription of the gene. The structural gene may be a normal cellulargene or an oncogene, for example. Alternatively, modulation may includeinhibition of translation of a transcript. Thus, “modulation” of geneexpression includes both gene activation and gene repression.

Modulation can be assayed by determining any parameter that isindirectly or directly affected by the expression of the target gene.Such parameters include, e.g., changes in RNA or protein levels; changesin protein activity; changes in product levels; changes in downstreamgene expression; changes in transcription or activity of reporter genessuch as, for example, luciferase, CAT, beta-galactosidase, or GFP (see,e.g., Mistili & Spector, (1997) Nature Biotechnology 15:961-964);changes in signal transduction; changes in phosphorylation anddephosphorylation; changes in receptor-ligand interactions; changes inconcentrations of second messengers such as, for example, cGMP, cAMP,IP₃, and Ca2⁺; changes in cell growth, changes in neovascularization,and/or changes in any functional effect of gene expression. Measurementscan be made in vitro, in vivo, and/or ex vivo. Such functional effectscan be measured by conventional methods, e.g., measurement of RNA orprotein levels, measurement of RNA stability, and/or identification ofdownstream or reporter gene expression. Readout can be by way of, forexample, chemiluminescence, fluorescence, colorimetric reactions,antibody binding, inducible markers, ligand binding assays; changes inintracellular second messengers such as cGMP and inositol triphosphate(IP₃); changes in intracellular calcium levels; cytokine release, andthe like.

Accordingly, the terms “modulating expression” “inhibiting expression”and “activating expression” of a gene can refer to the ability of amolecule to activate or inhibit transcription of a gene. Activationincludes prevention of transcriptional inhibition (i.e., prevention ofrepression of gene expression) and inhibition includes prevention oftranscriptional activation (i.e., prevention of gene activation).

To determine the level of gene expression modulation by a ZFP, cellscontacted with ZFPs are compared to control cells, e.g., without thezinc finger protein or with a non-specific ZFP, to examine the extent ofinhibition or activation. Control samples are assigned a relative geneexpression activity value of 100%. Modulation/inhibition of geneexpression is achieved when the gene expression activity value relativeto the control is about 80%, preferably 50% (i.e., 0.5× the activity ofthe control), more preferably 25%, more preferably 5-0%.Modulation/activation of gene expression is achieved when the geneexpression activity value relative to the control is 110%, morepreferably 150% (i.e., 1.5× the activity of the control), morepreferably 200-500%, more preferably 1000-2000% or more.

A “promoter” is defined as an array of nucleic acid control sequencesthat direct transcription. As used herein, a promoter typically includesnecessary nucleic acid sequences near the start site of transcription,such as, in the case of certain RNA polymerase II type promoters, a TATAelement, enhancer, CCAAT box, SP-1 site, etc. As used herein, a promoteralso optionally includes distal enhancer or repressor elements, whichcan be located as much as several thousand base pairs from the startsite of transcription. The promoters often have an element that isresponsive to transactivation by a DNA-binding moiety such as apolypeptide, e.g., a nuclear receptor, Gal4, the lac repressor and thelike.

A “constitutive” promoter is a promoter that is active under mostenvironmental and developmental conditions. An “inducible” promoter is apromoter that is active under certain environmental or developmentalconditions.

A “regulatory domain” or “functional domain” refers to a protein or apolypeptide sequence (or portion thereof) that has transcriptionalmodulation activity, or that is capable of interacting with proteinsand/or protein domains that have transcriptional modulation activity.Such proteins include, e.g., transcription factors and co-factors (e.g.,KRAB, MAD, ERD, SID, nuclear factor kappa B subunit p65, early growthresponse factor 1, and nuclear hormone receptors, VP16, VP64),endonucleases, integrases, recombinases, methyltransferases, histoneacetyltransferases, histone deacetylases and polypeptides which arecomponents of a chromatin remodeling complex, and their functionalfragments. Exemplary components of chromatin remodeling complexes aredisclosed in co-owned PCT/US01/40616, the disclosure of which isincorporated by reference herein in its entirety. A functional domaincan be covalently or non-covalently linked to a DNA-binding domain(e.g., a ZFP) to modulate transcription of a gene of interest.Alternatively, some binding domains, such as for example ZFPs can act inthe absence of a functional domain to modulate transcription.Furthermore, transcription of a gene of interest can be modulated by abinding domain, such as a ZFP, linked to multiple functional domains.

The term “heterologous” is a relative term, which when used withreference to portions of a nucleic acid indicates that the nucleic acidcomprises two or more subsequences that are not found in the samerelationship to each other in nature. For instance, a nucleic acid thatis recombinantly produced typically has two or more sequences fromunrelated genes synthetically arranged to make a new functional nucleicacid, e.g., a promoter from one source and a coding region from anothersource. The two nucleic acids are thus heterologous to each other inthis context. When added to a cell, the recombinant nucleic acids wouldalso be heterologous to the endogenous genes of the cell. Thus, in acell, a heterologous nucleic acid would include a recombinant nucleicacid that has integrated into the chromosome, or a recombinantextrachromosomal nucleic acid.

Similarly, a heterologous protein indicates that the protein comprisestwo or more subsequences that are not found in the same relationship toeach other in nature (e.g., a “fusion protein,” where the twosubsequences are encoded by a single nucleic acid sequence). See, e.g.,Ausubel, supra, for an introduction to recombinant techniques.

By “host cell” is meant a cell that contains one or more exogenousmolecules such as, for example, expression vectors and/or heterologousnucleic acids. The host cell typically supports the replication orexpression of an expression vector. Host cells may be prokaryotic cellssuch as E. coli, or eukaryotic cells such as fungal cells (e.g., yeast),protozoal cells, plant cells, insect cells, animal cells, avian cells,teleost cells, amphibian cells, mammalian cells, primate cells or humancells. Exemplary mammalian cell lines include CHO, HeLa, 293, COS-1, andthe like, e.g., cultured cells (in vitro), explants and primary cultures(in vitro and ex vivo), and cells in vivo.

The term “amino acid” refers to naturally occurring and synthetic aminoacids, as well as amino acid analogs and amino acid mimetics thatfunction in a manner similar to the naturally occurring amino acids.Naturally occurring amino acids are those encoded by the genetic code,as well as those amino acids that are later modified, e.g.,hydroxyproline, carboxyglutamate, and O-phosphoserine. Amino acidanalogs refers to compounds that have the same basic chemical structureas a naturally occurring amino acid, i.e., an α carbon that is bound toa hydrogen, a carboxyl group, an amino group, and an R group, e.g.,homoserine, norleucine, methionine sulfoxide, methionine, and methylsulfonium. Such analogs have modified R groups (e.g., norleucine) ormodified peptide backbones, but retain the same basic chemical structureas a naturally occurring amino acid. Amino acid mimetics refers tochemical compounds that have a structure that is different from thegeneral chemical structure of an amino acid, but that functions in amanner similar to a naturally occurring amino acid.

“Conservatively modified variants” applies to both amino acid andnucleic acid sequences. With respect to particular nucleic acidsequences, conservatively modified variants refers to those nucleicacids which encode identical or essentially identical amino acidsequences, or where the nucleic acid does not encode an amino acidsequence, to essentially identical sequences. Specifically, degeneratecodon substitutions may be achieved by generating sequences in which thethird position of one or more selected (or all) codons is substitutedwith mixed-base and/or deoxyinosine residues (Batzer et al., NucleicAcid Res. 19:5081 (1991); Ohtsuka et al., J. Biol. Chem. 260:2605-2608(1985); Rossolini et al., Mol. Cell. Probes 8:91-98 (1994)). Because ofthe degeneracy of the genetic code, a large number of functionallyidentical nucleic acids encode any given protein. For instance, thecodons GCA, GCC, GCG and GCU all encode the amino acid alanine. Thus, atevery position where an alanine is specified by a codon in an amino acidherein, the codon can be altered to any of the corresponding codonsdescribed without altering the encoded polypeptide. Such nucleic acidvariations are “silent variations,” which are one species ofconservatively modified variations. Every nucleic acid sequence hereinwhich encodes a polypeptide also describes every possible silentvariation of the nucleic acid. One of skill will recognize that eachcodon in a nucleic acid (except AUG, which is ordinarily the only codonfor methionine, and TGG, which is ordinarily the only codon fortryptophan) can be modified to yield a functionally identical molecule.Accordingly, each silent variation of a nucleic acid which encodes apolypeptide is implicit in each described sequence.

As to amino acid and nucleic acid sequences, individual substitutions,deletions or additions that alter, add or delete a single amino acid ornucleotide or a small percentage of amino acids or nucleotides in thesequence create a “conservatively modified variant,” where thealteration results in the substitution of an amino acid with achemically similar amino acid. Conservative substitution tablesproviding functionally similar amino acids are well known in the art.Such conservatively modified variants are in addition to and do notexclude polymorphic variants and alleles. See, e.g., Creighton, Proteins(1984) for a discussion of amino acid properties.

Localization Domains

Transcriptionally inactive regions of chromatin (e.g., telomeres,heterochromatin, matrix attachment regions, scaffold attachment regions,centromeres) have been observed to occupy distinct nuclear addresses.See, for example, Cockell et al. (1999) Curr. Opin. Genet. Devel.9:199-205; Mahy et al. (2000) in “Chromatin Structure and GeneExpression,” Second Edition (S. C. R. Elgin & J. L. Workman, eds.)Oxford University Press, Oxford. Pp.300-321 and references therein.Thus, there exists, in at least some cases, a correlation betweentranscriptional activity and nuclear localization. Moreover, certainnuclear proteins have been observed to be localized to specific regionswithin the nucleus. For example, the HP1 protein is localized to regionsof the nucleus that are rich in transcriptionally inactiveheterochromatin. See, for example, Eissenberg et al. (2000) Curr. Opin.Genet. Devel. 10:204-210. This heterochromatic localization of HP1 ismediated, at least in part, by a region of the HP1 protein known as thechromodomain. See, for example, Platero et al. (1995) EMBO J.14:3977-3986. One property of HP1-type chromodomains is their ability tobind to histone H3 that is methylated at lysine 9. See, for example,Lachner et al. (2001) Nature 410:116-120. Thus, exemplary localizationdomains include HP1 and the chromodomain, which is also found in anumber of other proteins in addition to HP1.

Additional examples of correlations between intranuclear localizationand transcriptional regulatory activity are provided by certain proteinsinvolved in generating and recognizing methylated chromosomal DNA.Methylation of cytosine within CpG dinucleotide sequences in chromosomalDNA often leads to transcriptional repression of genes associated withthese methylated sequences. Two types of proteins are directly involvedwith CpG methylation: DNA-N-methyl transferases (DNMTs), which catalyzethe methylation reaction, and methylated DNA binding proteins (known asMBDs because they possess a methylated DNA binding domain), which bindto methylated DNA and mediate certain transcriptional effects of DNAmethylation. Both of these classes of proteins possess transcriptionalregulatory activities in addition to their methylation, or methylatedDNA-binding, activities. These additional activities are related to theability of these proteins to recruit transcriptional regulatory andchromatin remodeling proteins and/or to localize to discrete nuclearcompartments, thereby drawing bound DNA into the compartment in whichthe protein is localized.

Accordingly, additional exemplary localization domains include DNMTs andmethylated DNA-binding domains (MBDs).

A. DNA-N-Methyl Transferases

The DNA methyltransferases Dnmt3a and Dnmt3b are responsible forcytosine methylation of CpG dinucleotide sequences. CpG methylation isoften associated with transcriptional repression, especially in thecontext of CpG islands located at or near the promoter of many mammaliangenes. However, DNA methyltransferases also possess transcriptionalrepression activity that is independent of their ability to methylateDNA and which involves association with histone deacetylases (HDACs).See, e.g., Rountree et al. (2000) Nature Genet. 25:269277; Robertson etal. (2000) Nature Genet. 25:338-342; Fuks et al. (2001) EMBO J.20:2536-2544. DNA methyltransferases are also able to localize toheterochromatic regions of the nucleus; this localizing ability residesin the N-terminal region of these proteins. See, for example, Bachman etal. (2001) J. Biol. Chem. 276:32,282-32,287. Thus, the transcriptionalrepression activity of DNMTs and related proteins is due, at least inpart, to their ability to recruit HDACs and to localize DNA sequences towhich they are bound to heterochromatic regions of the nucleus.

Accordingly, a DNMT, or functional fragment thereof, can serve as alocalization domain in the practice of the disclosed methods and the useof the disclosed compositions. Exemplary DNMTs include, but are notlimited to, DNMT1, DNMT2, DNMT3a, and DNMT3b. See also Robertson (2001)Oncogene 20:3139-3155.

B. Methyl Binding Domains

In vertebrates, methyl-CpG-binding domain proteins comprise twofunctional domains: one which binds to methylated CpG dinucleotides andone which appears to be involved in transcriptional silencing. It isknown that components of certain chromatin remodeling complexes bind tomethylated DNA. Chromatin remodeling complexes from human (NRD/NURDcomplex) and amphibian cells (Mi-2 complex) contain anucleosome-dependent ATPase activity called Mi-2 (also known as CHD).Additional protein components of the amphibian Mi-2 complex includeMta1-like (a DNA-binding protein homologous to metastasis-associatedprotein), RPD3 (the amphibian homologue of histone deacetylases HDAC1and HDAC2), RbAp48 (a protein which interacts with histone H4), and MBD3(a protein containing a methylated CpG binding domain). The amphibiancomplex additionally contains a serine- and proline-rich subunit, p66.Activities of the amphibian Mi-2 complex include a nucleosome-dependentATPase that is not stimulated by free histones or DNA, translationalmovement of histone octamers relative to DNA, and deacetylation of corehistones within a nucleosome. Guschin et al. (2000) Biochemistry39:5238-5245; Wade et al. (1999) Nature Genet. 23:62-66.

As described in the Examples below, Applicants have identified astructural motif in invertebrates (which lack DNA methylation) that ishomologous to the vertebrate MBD and is a component of a Mi-2-likecomplex. The results described herein indicate that these MBDs fulfilladditional functions besides binding methylated DNA. For example,invertebrate MBDs appear to be included in a chromatin remodelingcomplex (the Mi-2 complex, see Examples) and are also able to represstranscription when fused to the Gal4 DNA binding domain (see Examples).Thus, the term methyl CpG binding domain or “MBD” as used herein refersto polypeptides sharing the identified structural motif and functions(e.g., as components of chromatin remodeling complexes; as agents oftranscriptional repression, corepressors, etc.). Accordingly, MBDs may,but need not, bind to methylated CpG residues.

The methylated DNA-binding proteins MBD2 and MBD3 have been shown tolocalize to heterochromatic regions of the nucleus. Hendrich et al.(1998) Mol. Cell. Biol. 18:6538-6547. Additional proteins which possessthe ability to localize to heterochromatin include HP1 and DNA-N-methyltransferases (see supra). Accordingly, in one embodiment, thecompositions and methods described herein are directed to using alocalization domain to facilitate the recruitment of corepressioncomplexes to a particular site within chromatin, by fusion of thelocalization domain to a DNA binding domain that can access such a site,thereby repressing gene activity. In other aspects, the localizationdomain is used to interfere with corepression complexes and function byconstructing fusion molecules containing a localization domain, a DNAbinding domain and one or more regulatory domains that influence geneexpression (e.g., activation domains, repression domains and/orcomponents of a chromatin remodeling complex).

Any MBD having the requisite function and specificity is suitable. Thus,the MBD can be from any species. In certain embodiments, the MBD isderived from Drosophila MBD family members, for example dMBD-like anddMBD-likeΔ proteins described in the Examples. In other embodiments, theMBD is derived from vertebrate (e.g., mammalian) MBD proteins, forexample, MBD1, MBD2, MBD3, MBD4, MeCP1 and MeCP2. See, for example, Birdet al. (1999) Cell 99:451-454. To give but one example, the methylatedDNA-binding protein MeCP2 comprises identifiable transcriptionalrepression functions and methylated DNA-binding functions, and localizesto heterochromatin. Nan et al. (1993) Nucleic Acids Res. 21:4886-4892.Accordingly, these regions of MeCP2 can be used as localization domains.

It will be clear from the disclosure that the term “localizationdomain,” as used herein, refers to a molecule capable, either activelyor passively, of taking up a particular intranuclear address, suchaddress often constituting a nuclear compartment having specificcharacteristics related to transcriptional activity. The term is to bedistinguished from the terms “nuclear localization sequence” and“nuclear localization signal” which refer to sequences responsible fortransport of a polypeptide from the cytoplasm into the nucleus.

For the purposes of this disclosure, it is intended that the term“localization domain” additionally encompass those proteins orpolypeptides, or functional fragments thereof, that associate orinteract with a protein or protein domain capable of being localized.For example, the KRAB transcription regulatory domain interacts with theKAP-1 protein, which, in turn, interacts with HP1, which is localized toheterochromatin (see supra). Matsuda et al. (2001) J. Biol. Chem.276:14,222-14,229. Accordingly, proteins such as KAP-1 and KRAB, as wellas any other proteins capable of being localized, either intrinsicallyor through association with one or more additional proteins, can serveas a localization domain.

DNA-Binding Domains

In certain embodiments, the compositions and methods disclosed hereininvolve fusions between a DNA-binding domain and a localization domain.In additional embodiments, the compositions and methods disclosed hereininvolve fusions between a DNA-binding domain and a domain whichparticipates in modulation of gene expression (i.e., a regulatorydomain) such as, for example a transcriptional activation domain, atranscriptional repression domain or a component of a chromatinremodeling complex. A DNA-binding domain can comprise any molecularentity capable of sequence-specific binding to chromosomal DNA. Bindingcan be mediated by electrostatic interactions, hydrophobic interactions,or any other type of chemical interaction. Examples of moieties whichcan comprise part of a DNA-binding domain include, but are not limitedto, minor groove binders, major groove binders, antibiotics,intercalating agents, peptides, polypeptides, oligonucleotides, andnucleic acids. An example of a DNA-binding nucleic acid is atriplex-forming oligonucleotide.

Minor groove binders include substances which, by virtue of their stericand/or electrostatic properties, interact preferentially with the minorgroove of double-stranded nucleic acids. Certain minor groove bindersexhibit a preference for particular sequence compositions. For instance,netropsin, distamycin and CC-1065 are examples of minor groove binderswhich bind specifically to AT-rich sequences, particularly runs of A orT. WO 96/32496.

Many antibiotics are known to exert their effects by binding to DNA.Binding of antibiotics to DNA is often sequence-specific or exhibitssequence preferences. Actinomycin, for instance, is a relativelyGC-specific DNA binding agent.

In a preferred embodiment, a DNA-binding domain is a polypeptide.Certain peptide and polypeptide sequences bind to double-stranded DNA ina sequence-specific manner. For example, certain transcription factorsparticipate in transcription initiation by RNA Polymerase II throughsequence-specific interactions with DNA in the promoter and/or enhancerregions of genes. Defined regions within the polypeptide sequence ofvarious transcription factors have been shown to be responsible forsequence-specific binding to DNA. See, for example, Pabo et al. (1992)Ann. Rev. Biochem. 61:1053-1095 and references cited therein. Theseregions include, but are not limited to, motifs known as leucinezippers, helix-loop-helix (HLH) domains, helix-turn-helix domains, zincfingers, β-sheet motifs, steroid receptor motifs, bZIP domainshomeodomains, AT-hooks and others. The amino acid sequences of thesemotifs are known and, in some cases, amino acids that are critical forsequence specificity have been identified. Polypeptides involved inother process involving DNA, such as replication, recombination andrepair, will also have regions involved in specific interactions withDNA. Peptide sequences involved in specific DNA recognition, such asthose found in proteins involved in transcription, replication,recombination and repair, can be obtained through recombinant DNAcloning and expression techniques or by chemical synthesis, and can beattached to other components of a fusion molecule by methods known inthe art.

In a more preferred embodiment, a DNA-binding domain comprises a zincfinger DNA-binding domain (ZFP). See, for example, Miller et al. (1985)EMBO J. 4:1609-1614; Rhodes et al. (1993) Scientific AmericanFeb.:56-65; and Klug (1999) J. Mol. Biol. 293:215-218. In oneembodiment, a target site for a zinc finger DNA-binding domain isidentified according to site selection rules disclosed in co-owned WO00/42219. ZFP DNA-binding domains are designed and/or selected torecognize a particular target site as described in co-owned WO 00/42219and WO 00/41566; as well as U.S. Pat. Nos. 5,789,538; 6,007,408; and6,013,453; and PCT publications WO 95/19431, WO 98/53057, WO 98/53058,WO 98/53059, WO 98/53060, WO 98/54311, WO 00/23464 and WO 00/27878.

Certain DNA-binding domains are capable of binding to DNA that ispackaged in nucleosomes. See, for example, Cordingley et al. (1987) Cell48:261-270; Pina et al. (1990) Cell 60:719-731; and Cirillo et al.(1998) EMBO J. 17:244-254. Certain ZFP-containing proteins such as, forexample, members of the nuclear hormone receptor superfamily, arecapable of binding DNA sequences packaged into chromatin. These include,but are not limited to, the glucocorticoid receptor and the thyroidhormone receptor. Archer et al. (1992) Science 255:1573-1576; Wong etal. (1997) EMBO J. 16:7130-7145. Other DNA-binding domains, includingcertain ZFP-containing binding domains, require more accessible DNA forbinding. In the latter case, the binding specificity of the DNA-bindingdomain can be determined by identifying accessible regions in thecellular chromatin. Accessible regions can be determined as described,for example, in co-owned PCT/US01/13631 and PCT/US01/40617, thedisclosures of which are incorporated by reference herein in theirentireties. A DNA-binding domain is then designed and/or selected tobind to a target site within the accessible region.

Fusion Molecules

The discovery that localization domains are involved in transcriptionalcorepression complexes in different vertebrate and invertebrate speciesalso allows for the design of fusion molecules which facilitateregulation of gene expression. Thus, in certain embodiments, thecompositions and methods disclosed herein involve fusions between aDNA-binding domain and a localization domain (such as, for example, aMBD) or functional fragment, as described supra, or a polynucleotideencoding such a fusion. In this way, a localization domain is broughtinto proximity with a sequence in a gene that is bound by theDNA-binding domain. The transcriptional repression function of thelocalization domain is then able to act on the gene, by recruitingadditional corepressors and/or by transporting the bound gene to arepressive compartment of the nucleus.

In additional embodiments, target remodeling of chromatin, as disclosedin co-owned PCT/US01/40606 (the disclosure of which is incorporated byreference herein in its entirety) can be used to generate one or moresites in cellular chromatin that are accessible to the binding of alocalization domain/DNA binding domain fusion molecule.

Fusion molecules are constructed by methods of cloning and biochemicalconjugation that are well-known to those of skill in the art. Fusionmolecules comprise a DNA-binding domain and a localization domain or afunctional fragment thereof. In certain embodiments, fusion moleculescomprise a DNA-binding domain, a localization domain, and a regulatorydomain (e.g., a transcriptional activation or repression domain or acomponent of a chromatin remodeling complex). Fusion molecules alsooptionally comprise nuclear localization signals (such as, for example,that from the SV40 medium T-antigen) and epitope tags (such as, forexample, FLAG, myc and hemagglutinin). Fusion proteins (and nucleicacids encoding them) are designed such that the translational readingframe is preserved among the components of the fusion.

Fusions between a polypeptide component of a localization domain (or afunctional fragment thereof) on the one hand, and a non-proteinDNA-binding domain (e.g., antibiotic, intercalator, minor groove binder,nucleic acid) on the other, are constructed by methods of biochemicalconjugation known to those of skill in the art. See, for example, thePierce Chemical Company (Rockford, Ill.) Catalogue. Methods andcompositions for making fusions between a minor groove binder and apolypeptide have been described. Mapp et al. (2000) Proc. Natl. Acad.Sci. USA 97:3930-3935.

The fusion molecules disclosed herein comprise a DNA-binding domainwhich binds to a target site. In certain embodiments, the target site ispresent in an accessible region of cellular chromatin. Accessibleregions can be determined as described, for example, in co-ownedPCT/US01/13631 and PCT/US01/40617, the disclosures of which are herebyincorporated by reference herein in their entireties. If the target siteis not present in an accessible region of cellular chromatin, one ormore accessible regions can be generated as described in co-ownedPCT/US01/40616, the disclosure of which is hereby incorporated byreference herein in its entirety. In additional embodiments, theDNA-binding domain of a fusion molecule is capable of binding tocellular chromatin regardless of whether its target site is in anaccessible region or not. For example, such DNA-binding domains arecapable of binding to linker DNA and/or nucleosomal DNA. Examples ofthis type of “pioneer” DNA binding domain are found in certain steroidreceptor and in hepatocyte nuclear factor 3 (HNF3). Cordingley et al.(1987) Cell 48:261-270; Pina et al. (1990) Cell 60:719-731; and Cirilloet al. (1998) EMBO J. 17:244-254.

Methods of chromatin modification or binding using a localization domaincan be combined with methods involving binding of endogenous orexogenous transcriptional regulators in the region of interest toachieve modulation of gene expression. Modulation of gene expression canbe in the form of repression as, for example, when the target generesides in a pathological infecting microorganism or in an endogenousgene of the subject, such as an oncogene or a viral receptor, thatcontributes to a disease state. Further, as described supra, repressionof a specific target gene can be achieved by using a fusion moleculecomprising a localization domain (or functional fragment thereof) and aDNA-binding domain, for compartmentalizing the target DNA (and relatedgene) into a transcriptionally repressed nuclear location.

Alternatively, modulation can be in the form of activation, for example,if activation of a gene (e.g., a tumor suppressor gene) can ameliorate adisease state. In this case, a cell is contacted with a fusion moleculecomprising, a localization domain, a DNA-binding domain and atranscriptional activation domain. The localization domain portion ofthe fusion molecule localizes it to the repressive compartment of thenucleus, where the DNA-binding domain is able to access the target gene.The activation domain is then able to activate transcription of thesilenced gene, by removing it from the repressive nuclear compartmentand/or by recruiting additional coactivators that overcome therepressive environment of the target gene. These embodiments areparticularly suitable for the reactivation of genes whose expression hasbeen silenced during development, as such developmental silencingmechanisms often depend upon methylation of the silenced gene.

A further exemplary method for reactivation of a gene located in arepressive nuclear compartment is to utilize a fusion comprising alocalization domain, a DNA-binding domain and a component of a chromatinremodeling complex. In this case, the localization domain localizes thefusion molecule to a repressive nuclear compartment, in which theDNA-binding portion of the fusion molecule gains access to the targetgene. The chromatin remodeling component is able to assemble an activechromatin remodeling complex on the target gene, resulting inmodification of the chromatin structure on the target gene into atranscriptionally active conformation.

Additional embodiments involve the use of a fusion molecule comprising aDNA-binding domain and a localization domain, in combination with asecond molecule having transcriptional regulatory activity which bindsin the region of interest, to regulate expression of one or more targetgenes. In certain embodiments, the second molecule comprises a fusionbetween a DNA-binding domain and either a transcriptional activationdomain or a transcriptional repression domain. Any polypeptide sequenceor domain capable of influencing gene expression, which can be fused toa DNA-binding domain, is suitable for use. Activation and repressiondomains are known to those of skill in the art and are disclosed, forexample, in co-owned WO 00/41566.

Exemplary activation domains include, but are not limited to, VP16,VP64, p300, CBP, PCAF,SRC1 PvALF, AtHD2A and ERF-2. See, for example,Robyr et al. (2000) Mol. Endocrinol. 14:329-347; Collingwood et al.(1999) J. Mol. Endocrinol. 23:255-275; Leo et al. (2000) Gene 245:1-11;Manteuffel-Cymborowska (1999) Acta Biochim. Pol. 46:77-89; McKenna etal. (1999) J. Steroid Biochem. Mol. Biol. 69:3-12; Malik et al. (2000)Trends Biochem. Sci. 25:277-283; and Lemon et al. (1999) Curr. Opin.Genet. Dev. 9:499-504. Additional exemplary activation domains include,but are not limited to, OsGAI, HALF-1, C1, AP1, ARF-5, -6, -7, and -8,CPRF1, CPRF4, MYC-RP/GP, and TRAB1. See, for example, Ogawa et al.(2000) Gene 245:21-29; Okanami et al. (1996) Genes Cells 1:87-99; Goffetal. (1991) Genes Dev. 5:298-309; Cho et al. (1999) Plant Mol. Biol.40:419-429; Ulmason et al. (1999) Proc. Natl. Acad. Sci. USA96:5844-5849; Sprenger-Haussels et al. (2000) Plant J. 22:1-8; Gong etal. (1999) Plant Mol. Biol. 41:33-44; and Hobo et al. (1999) Proc. Natl.Acad. Sci. USA 96:15,348-15,353.

Exemplary repression domains include, but are not limited to, KRAB, SID,MBD2, MBD3, members of the DNMT family (e.g., DNMT1, DNMT3A, DNMT3B),Rb, and MeCP2. See, for example, Bird et al. (1999) Cell 99:451-454;Tyler et al. (1999) Cell 99:443-446; Knoepfler et al. (1999) Cell99:447-450; and Robertson et al. (2000) Nature Genet. 25:338-342.Additional exemplary repression domains include, but are not limited to,ROM2 and AtHD2A. See, for example, Chern et al. (1996) Plant Cell8:305-321; and Wu et al. (2000) Plant J. 22:19-27.

Common regulatory domains for use in a fusion molecule include, e.g.,effector domains from transcription factors (activators, repressors,co-activators, co-repressors), silencers, nuclear hormone receptors,oncogene transcription factors (e.g., myc, jun, fos, myb, max, mad, rel,ets, bcl, myb, mos family members etc.); DNA repair enzymes and theirassociated factors and modifiers; DNA rearrangement enzymes and theirassociated factors and modifiers; chromatin associated proteins andtheir modifiers (e.g., kinases, acetylases and deacetylases); and DNAmodifying enzymes (e.g., methyltransferases, topoisomerases, helicases,ligases, kinases, phosphatases, polymerases, endonucleases) and theirassociated factors and modifiers.

Transcription factor polypeptides from which one can obtain a regulatorydomain include those that are involved in regulated and basaltranscription. Such polypeptides include transcription factors, theireffector domains, coactivators, silencers, nuclear hormone receptors(see, e.g., Goodrich et al., Cell 84:825-30 (1996) for a review ofproteins and nucleic acid elements involved in transcription;transcription factors in general are reviewed in Barnes & Adcock, Clin.Exp. Allergy 25 Suppl. 2:46-9 (1995) and Roeder, Methods Enzymol.273:165-71 (1996)). Databases dedicated to transcription factors areknown (see, e.g., Science 269:630 (1995)). Nuclear hormone receptortranscription factors are described in, for example, Rosen et al., J.Med. Chem. 38:4855-74 (1995). The C/EBP family of transcription factorsare reviewed in Wedel et al., Immunobiology 193:171-85 (1995).Coactivators and co-repressors that mediate transcription regulation bynuclear hormone receptors are reviewed in, for example, Meier, Eur. J.Endocrinol. 134(2):158-9 (1996); Kaiser et al., Trends Biochem. Sci.21:342-5 (1996); and Utley et al., Nature 394:498-502 (1998)). GATAtranscription factors, which are involved in regulation ofhematopoiesis, are described in, for example, Simon, Nat. Genet. 11:9-11(1995); Weiss et al., Exp. Hematol. 23:99-107. TATA box binding protein(TBP) and its associated TAF polypeptides (which include TAF30, TAF55,TAF80, TAF110, TAF150, and TAF250) are described in Goodrich & Tjian,Curr. Opin. Cell Biol. 6:403-9 (1994) and Hurley, Curr. Opin. Struct.Biol. 6:69-75 (1996). The STAT family of transcription factors arereviewed in, for example, Barahmand-Pour et al., Curr. Top. Microbiol.Immunol. 211:121-8 (1996). Transcription factors involved in disease arereviewed in Aso et al., J. Clin. Invest. 97:1561-9 (1996).

In one embodiment, the KRAB repression domain from the human KOX-1protein is used as a repression domain (Thiesen et al., New Biologist2:363-374 (1990); Margolin et al., PNAS 91:4509-4513 (1994); Pengue etal., Nucl. Acids Res. 22:2908-2914 (1994); Witzgall et al., PNAS91:4514-4518 (1994); see also Example 3)). In another embodiment, KAP-1,a KRAB co-repressor, is used with KRAB (Friedman et al., Genes Dev.10:2067-2078 (1996)). Other preferred transcription factors andtranscription factor domains that act as transcriptional repressorsinclude MAD (see, e.g., Sommer et al., J. Biol. Chem. 273:6632-6642(1998); Gupta et al., Oncogene 16:1149-1159 (1998); Queva et al.,Oncogene 16:967-977 (1998); Larsson et al., Oncogene 15:737-748 (1997);Laherty et al., Cell 89:349-356 (1997); and Cultraro et al., Mol Cell.Biol. 17:2353-2359 (19977)); FKHR (forkhead in rhapdosarcoma gene;Ginsberg et al., Cancer Res. 15:3542-3546 (1998); Epstein et al., Mol.Cell. Biol. 18:4118-4130 (1998)); EGR-1 (early growth response geneproduct-1; Yan et al., PNAS 95:8298-8303 (1998); and Liu et al., CancerGene Ther. 5:3-28 (1998)); the ets2 repressor factor repressor domain(ERD; Sgouras et al., EMBO J. 14:4781-4793 ((19095)); and the MAD smSIN3interaction domain (SID; Ayer et al., Mol. Cell. Biol. 16:5772-5781(1996)).

In one embodiment, the HSV VP16 activation domain is used as atranscriptional activator (see, e.g., Hagmann et al., J. Virol.71:5952-5962 (1997)). Other preferred transcription factors that couldsupply activation domains include the VP64 activation domain (Seipel etal., EMBO J. 11:4961-4968 (1996)); nuclear hormone receptors (see, e.g.,Torchia et al., Curr. Opin. Cell. Biol. 10:373-383 (1998)); the p65subunit of nuclear factor kappa B (Bitko & Barik, J. Virol. 72:5610-5618(1998) and Doyle & Hunt, Neuroreport 8:2937-2942 (1997)); and EGR-1(early growth response gene product-1; Yan et al., PNAS 95:8298-8303(1998); and Liu et al., Cancer Gene Ther. 5:3-28 (1998)).

Kinases, phosphatases, and other proteins that modify polypeptidesinvolved in gene regulation are also useful as regulatory domains foruse in fusion molecules. Such modifiers are often involved in switchingon or off transcription mediated by, for example, hormones. Kinasesinvolved in transcription regulation are reviewed in Davis, Mol. Reprod.Dev. 42:459-67 (1995), Jackson et al., Adv. Second MessengerPhosphoprotein Res. 28:279-86 (1993), and Boulikas, Crit. Rev. Eukaryot.Gene Expr. 5:1-77 (1995), while phosphatases are reviewed in, forexample, Schonthal & Semin, Cancer Biol. 6:239-48 (1995). Nucleartyrosine kinases are described in Wang, Trends Biochem. Sci. 19:373-6(1994).

As described, useful domains can also be obtained from the gene productsof oncogenes (e.g., myc, jun, fos, myb, max, mad, rel, ets, bcl, myb,mos, erb family members) and their associated factors and modifiers.Oncogenes are described in, for example, Cooper, Oncogenes, 2nd ed., TheJones and Bartlett Series in Biology, Boston, Mass., Jones and BartlettPublishers, 1995. The ets transcription factors are reviewed in Waslylket al., Eur. J. Biochem. 211:7-18 (1993) and Crepieux et al., Crit. Rev.Oncog. 5:615-38 (1994). Myc oncogenes are reviewed in, for example, Ryanet al., Biochem. J. 314:713-21 (1996). The jun and fos transcriptionfactors are described in, for example, The Fos and Jun Families ofTranscription Factors, Angel & Herrlich, eds. (1994). The max oncogeneis reviewed in Hurlin et al., Cold Spring Harb. Symp. Quant. Biol.59:109-16. The myb gene family is reviewed in Kanei-Ishii et al., Curr.Top. Microbiol. Immunol. 211:89-98 (1996). The mos family is reviewed inYew et al., Curr. Opin. Genet. Dev. 3:19-25 (1993).

Regulatory domains can also be obtained from DNA replication and repairenzymes and their associated factors and modifiers. DNA repair systemsare reviewed in, for example, Vos, Curr. Opin. Cell Biol. 4:385-95(1992); Sancar, Ann. Rev. Genet. 29:69-105 (1995); Lehmann, Genet. Eng.17:1-19 (1995); and Wood, Ann. Rev. Biochem. 65:135-67 (1996). DNArearrangement enzymes and their associated factors and modifiers canalso be used as regulatory domains (see, e.g., Gangloff et al.,Experientia 50:261-9 (1994); Sadowski, FASEB J. 7:760-7 (1993)).

Similarly, regulatory domains can be derived from DNA modifying enzymes(e.g., DNA methyltransferases, topoisomerases, helicases, ligases,kinases, phosphatases, polymerases) and their associated factors andmodifiers. Helicases are reviewed in Matson et al., Bioessays, 16:13-22(1994), and methyltransferases are described in Cheng, Curr. Opin.Struct. Biol. 5:4-10 (1995). Chromatin associated proteins and theirmodifiers (e.g., kinases, methylases, acetylases and deacetylases), suchas histone deacetylase (Wolffe, Science 272:371-2 (1996)) are alsouseful as regulatory domains for use in a fusion molecule. In oneembodiment, the regulatory domain is a DNA methyl transferase that actsas a transcriptional repressor (see, e.g., Van den Wyngaert et al., FEBSLett. 426:283-289 (1998); Flynn et al., J. Mol. Biol. 279:101-116(1998); Okano et al., Nucleic Acids Res. 26:2536-2540 (1998); and Zardo& Caiafa, J. Biol. Chem. 273:16517-16520 (1998)). In another embodiment,endonucleases such as Fok1 are used as transcriptional repressors, whichact via gene cleavage (see, e.g., WO 94/18313 and WO95/09233).

Factors that control chromatin and DNA structure, movement andlocalization and their associated factors and modifiers; factors derivedfrom microbes (e.g., prokaryotes, eukaryotes and virus) and factors thatassociate with or modify them can also be used in the synthesis offusion molecules. In one embodiment, recombinases and integrases areused as regulatory domains. In one embodiment, histone acetyltransferaseis used as a transcriptional activator (see, e.g., Jin & Scotto, Mol.Cell. Biol. 18:4377-4384 (1998); Wolffe, Science 272:371-372 (1996);Taunton et al., Science 272:408-411 (1996); and Hassig et al., PNAS95:3519-3524 (1998)). In another embodiment, histone deacetylase is usedas a transcriptional repressor (see, e.g., Jin & Scotto, Mol. Cell.Biol. 18:4377-4384 (1998); Syntichaki & Thireos, J. Biol. Chem.273:24414-24419 (1998); Sakaguchi et al., Genes Dev. 12:2831-2841(1998); and Martinez et al., J. Biol. Chem. 273:23781-23785 (1998)).

Another suitable repression domain is methyl binding domain protein 2B(MBD-2B) (see, also Hendrich et al. (1999) Mamm Genome 10:906-912 fordescription of MBD proteins). Another useful repression domain is thatassociated with the v-ErbA protein (see infra). See, for example, Damm,et al. (1989) Nature 339:593-597; Evans (1989) Int. J. Cancer Suppl.4:26-28; Pain et al. (1990) New Biol. 2:284-294; Sap et al. (1989)Nature 340:242-244; Zenke et al. (1988) Cell 52:107-119; and Zenke etal. (1990) Cell 61:1035-1049. Additional exemplary repression domainsinclude, but are not limited to, thyroid hormone receptor (TR, seeinfra), SID, MBD1, MBD2, MBD3, MBD4, MBD-like proteins, members of theDNMT family (e.g., DNMT1, DNMT3A, DNMT3B), Rb, MeCP1 and MeCP2. See, forexample, Bird et al. (1999) Cell 99:451-454; Tyler et al. (1999) Cell99:443-446; Knoepfler et al. (1999) Cell 99:447-450; and Robertson etal. (2000) Nature Genet. 25:338-342. Additional exemplary repressiondomains include, but are not limited to, ROM2 and AtHD2A. See, forexample, Chern et al. (1996) Plant Cell 8:305-321; and Wu et al. (2000)Plant J. 22:19-27.

Certain members of the nuclear hormone receptor (NHR) superfamily,including, for example, thyroid hormone receptors (TRs) and retinoicacid receptors (RARs) are among the most potent transcriptionalregulators currently known. Zhang et al., Annu. Rev. Physiol. 62:439-466(2000) and Sucov et al., Mol Neurobiol 10(2-3):169-184 (1995). In theabsence of their cognate ligand, these proteins bind with highspecificity and affinity to short stretches of DNA (e.g., 12-17 basepairs) within regulatory loci (e.g., enhancers and promoters) and effectrobust transcriptional repression of adjacent genes. The potency oftheir regulatory action stems from the concurrent use of two distinctfunctional pathways to drive gene silencing: (i) the creation of alocalized domain of repressive chromatin via the targeting of a complexbetween the corepressor N-CoR and a histone deacetylase, HDAC3 (Guentheret al., Genes Dev 14:1048-1057 (2000); Urnov et al., EMBO J 19:4074-4090(2000); Li et al., EMBO J 19,4342-4350 (2000) and Underhill et al., J.Biol. Chem. 275:40463-40470 (2000)) and (ii) a chromatin-independentpathway (Urnov et al., supra) that may involve direct interference withthe function of the basal transcription machinery (Fondell et al., GenesDev 7(7B):1400-1410 (1993) and Fondell et al., Mol Cell Biol 16:281-287(1996).

In the presence of very low (e.g., nanomolar) concentrations of theirligand, these receptors undergo a conformational change which leads tothe release of corepressors, recruitment of a different class ofauxiliary molecules (e.g., coactivators) and potent transcriptionalactivation. Collingwood et al., J. Mol. Endocrinol. 23(3):255-275(1999).

The portion of the receptor protein responsible for transcriptionalcontrol (e.g., repression and activation) can be physically separatedfrom the portion responsible for DNA binding, and retains fullfunctionality when tethered to other polypeptides, for example, otherDNA-binding domains. Accordingly, a nuclear hormone receptortranscription control domain can be used as a portion of a fusionmolecule, such that the transcriptional regulatory activity of thereceptor can be targeted to a chromosomal region of interest (e.g., agene) by virtue of a DNA-binding domain (e.g., a ZFP binding domain).

Moreover, the structure of TR and other nuclear hormone receptors can bealtered, either naturally or through recombinant techniques, such thatit loses all capacity to respond to hormone (thus losing its ability todrive transcriptional activation), but retains the ability to effecttranscriptional repression. This approach is exemplified by thetranscriptional regulatory properties of the oncoprotein v-ErbA. Thev-ErbA protein is one of the two proteins required for leukemictransformation of immature red blood cell precursors in young chicks bythe avian erythroblastosis virus. TR is a major regulator oferythropoiesis (Beug et al., Biochim Biophys Acta 1288(3):M35-47 (1996);in particular, in its unliganded state, it represses genes required forcell cycle arrest and the differentiated state. Thus, the administrationof thyroid hormone to immature erythroblasts leads to their rapiddifferentiation. The v-ErbA oncoprotein is an extensively mutatedversion of TR; these mutations include: (i) deletion of 12amino-terminal amino acids; (ii) fusion to the gag oncoprotein; (iii)several point mutations in the DNA binding domain that alter the DNAbinding specificity of the protein relative to its parent, TR, andimpair its ability to heterodimerize with the retinoid X receptor; (iv)multiple point mutations in the ligand-binding domain of the proteinthat effectively eliminate the capacity to bind thyroid hormone; and (v)a deletion of a carboxy-terminal stretch of amino acids that isessential for transcriptional activation. Stunnenberg et al., BiochimBiophys Acta 1423(1):F15-33 (1999). As a consequence of these mutations,v-ErbA retains the capacity to bind to naturally occurring TR targetgenes and is an effective transcriptional repressor when bound (Urnov etal., supra; Sap et al., Nature 340:242-244 (1989); and Ciana et al.,EMBO J. 17(24):7382-7394 (1999). In contrast to TR, however, v-ErbA iscompletely insensitive to thyroid hormone, and thus maintainstranscriptional repression in the face of a challenge from anyconcentration of thyroids or retinoids, whether endogenous to themedium, or added by the investigator.

This functional property of v-ErbA is retained when its repressiondomain is fused to a heterologous, synthetic DNA binding domain.Accordingly, in one aspect, v-ErbA or its functional fragments are usedas a repression domain. In additional embodiments, TR or its functionaldomains are used as a repression domain in the absence of ligand and/oras an activation domain in the presence of ligand (e.g.,3,5,3′-triiodo-L-thyronine or T3). Thus, TR can be used as a switchablefunctional domain (i.e., a bifunctional domain); its activity(activation or repression) being dependent upon the presence or absence(respectively) of ligand.

Additional exemplary repression domains are obtained from the DAXprotein and its functional fragments. Zazopoulos et al., Nature390:311-315 (1997). In particular, the C-terminal portion of DAX-1,including amino acids 245-470, has been shown to possess repressionactivity. Altincicek et al., J. Biol. Chem. 275:7662-7667 (2000). Afurther exemplary repression domain is the RBP1 protein and itsfunctional fragments. Lai et al., Oncogene 18:2091-2100 (1999); Lai etal., Mol. Cell. Biol. 19:6632-6641 (1999); Lai et al., Mol. Cell. Biol.21:2918-2932 (2001) and WO 01/04296. The full-length RBP1 polypeptidecontains 1257 amino acids. Exemplary functional fragments of RBP1 are apolypeptide comprising amino acids 1114-1257, and a polypeptidecomprising amino acids 243-452.

Members of the TIEG family of transcription factors contain threerepression domains known as R1, R2 and R3. Repression by TIEG familyproteins is achieved at least in part through recruitment of mSIN3Ahistone deacetylases complexes. Cook et al. (1999) J. Biol. Chem.274:29,500-29,504; Zhang et al. (2001) Mol. Cell. Biol. 21:5041-5049.Any or all of these repression domains (or their functional fragments)can be fused alone, or in combination with additional repression domains(or their functional fragments), to a DNA-binding domain to generate atargeted exogenous repressor molecule.

Furthermore, the product of the human cytomegalovirus (HCMV) UL34 openreading frame acts as a transcriptional repressor of certain HCMV genes,for example, the US3 gene. LaPierre et al. (2001) J. Virol.75:6062-6069. Accordingly, the UL34 gene product, or functionalfragments thereof, can be used as a component of a fusion molecule.Nucleic acids encoding such fusions are also useful in the methods andcompositions disclosed herein.

Yet another exemplary repression domain is the CDF-1 transcriptionfactor and/or its functional fragments. See, for example, WO 99/27092.

The Ikaros family of proteins are involved in the regulation oflymphocyte development, at least in part by transcriptional repression.Accordingly, an Ikaros family member (e.g., Ikaros, Aiolos) or afunctional fragment thereof, can be used as a repression domain. See,for example, Sabbattini et al. (2001) EMBO J. 20:2812-2822.

The yeast Ash1p protein comprises a transcriptional repression domain.Maxon et al. (2001) Proc. Natl. Acad. Sci. USA 98:1495-1500.Accordingly, the Ash1p protein, its functional fragments, and homologuesof Ash1p, such as those found, for example, in, vertebrate, mammalian,and plant cells, can serve as a repression domain for use in the methodsand compositions disclosed herein.

Additional exemplary repression domains include those derived fromhistone deacetylases (HDACs, e.g., Class I HDACs, Class II HDACs, SIR-2homologues), HDAC-interacting proteins (e.g., SIN3, SAP30, SAP15, NCoR,SMRT, RB, p107, p130, RBAP46/48, MTA, Mi-2, Brg1, Brm), DNA-cytosinemethyltransferases (e.g., Dnmt1, Dnmt3a, Dnmt3b), proteins that bindmethylated DNA (e.g., MBD1, MBD2, MBD3, MBD4, MeCP2, DMAP1), proteinmethyltransferases (e.g., lysine and arginine methylases, SuVarhomologues such as Suv39H1), polycomb-type repressors (e.g., Bmi-1,eed1, RING1, RYBP, E2F6, Mel18, YY1 and CtBP), viral repressors (e.g.,adenovirus E1b 55K protein, cytomegalovirus UL34 protein, viraloncogenes such as v-erbA), hormone receptors (e.g., Dax-1, estrogenreceptor, thyroid hormone receptor), and repression domains associatedwith naturally-occurring zinc finger proteins (e.g., WT1, KAP1). Furtherexemplary repression domains include members of the polycomb complex andtheir homologues, HPH1, HPH2, HPC2, NC2, groucho, Eve, tramtrak, mHP1,SIP1, ZEB1, ZEB2, and Enx1/Ezh2. In all of these cases, either thefull-length protein or a functional fragment can be used as a repressiondomain in a fusion molecule. Furthermore, any homologues of theaforementioned proteins can also be used as repression domains, as canproteins (or their functional fragments) that interact with any of theaforementioned proteins.

Additional repression domains, and exemplary functional fragments, areas follows. Hes1 is a human homologue of the Drosophila hairy geneproduct and comprises a functional fragment encompassing amino acids910-1014. In particular, a WRPW (trp-arg-pro-trp) motif can act as arepression domain. Fisher et al. (1996) Mol. Cell. Biol. 16:2670-2677.

The TLE1, TLE2 and TLE3 proteins are human homologues of the Drosophilagroucho gene product. Functional fragments of these proteins possessingrepression activity reside between amino acids 1-400. Fisher et al.,supra.

The Tbx3 protein possesses a functional repression domain between aminoacids 524-721. He et al. (1999) Proc. Natl. Acad. Sci. USA96:10,212-10,217. The Tbx2 gene product is involved in repression of thep14/p16 genes and contains a region between amino acids 504-702 that ishomologous to the repression domain of Tbx3; accordingly Tbx2 and/orthis functional fragment can be used as a repression domain. Carreira etal. (1998) Mol. Cell. Biol. 18:5,099-5,108.

The human Ezh2 protein is a homologue of Drosophila enhancer of zesteand recruits the eed1 polycomb-type repressor. A region of the Ezh2protein comprising amino acids 1- 193 can interact with eed1 and represstranscription; accordingly Ezh2 and/or this functional fragment can beused as a repression domain. Denisenko et al. (1998) Mol. Cell. Biol.18:5634-5642.

The RYBP protein is a corepressor that interacts with polycomb complexmembers and with the YY1 transcription factor. A region of RYBPcomprising amino acids 42-208 has been identified as functionalrepression domain. Garcia et al. (1999) EMBO J. 18:3404-3418.

The RING finger protein RING1A is a member of two different vertebratepolycomb-type complexes, contains multiple binding sites for variouscomponents of the polycomb complex, and possesses transcriptionalrepression activity. Accordingly, RING1A or its functional fragments canserve as a repression domain. Satjin et al. (1997) Mol. Cell. Biol.17:4105-4113.

The Bmi-1 protein is a member of a vertebrate polycomb complex and isinvolved in transcriptional silencing. It contains multiple bindingsites for various polycomb complex components. Accordingly, Bmi-1 andits functional fragments are useful as repression domains. Gunster etal. (1997) Mol. Cell. Biol. 17:2326-2335; Hemenway et al. (1998)Oncogene 16:2541-2547.

The E2F6 protein is a member of the mammalian Bmi-1 -containing polycombcomplex and is a transcriptional repressor that is capable or recruitingRYBP, Bmi-1 and RING1A. A functional fragment of E2F6 comprising aminoacids 129-281 acts as a transcriptional repression domain. Accordingly,E2F6 and its functional fragments can be used as repression domains.Trimarchi et al. (2001) Proc Natl. Acad. Sci. USA 98:1519-1524.

The eed1 protein represses transcription at least in part throughrecruitment of histone deacetylases (e.g., HDAC2). Repression activityresides in both the N- and C-terminal regions of the protein.Accordingly, eed1 and its functional fragments can be used as repressiondomains. van der Vlag et al. (1999) Nature Genet. 23:474-478.

The CTBP2 protein represses transcription at least in part throughrecruitment of an HPC2-polycomb complex. Accordingly, CTBP2 and itsfunctional fragments are useful as repression domains. Richard et al.(1999) Mol. Cell. Biol. 19:777-787.

Neuron-restrictive silencer factors are proteins that repress expressionof neuron-specific genes. Accordingly, a NRSF or functional fragmentthereof can serve as a repression domain. See, for example, U.S. Pat.No. 6,270,990.

It will be clear to those of skill in the art that, in the formation ofa fusion protein (or a nucleic acid encoding same) between a DNA-bindingdomain and a regulatory domain, either a repressor or a molecule thatinteracts with a repressor is suitable as a repression domain.Essentially any molecule capable of recruiting a repressive complexand/or repressive activity (such as, for example, histone deacetylation)to the target gene is useful as a repression domain of a fusion protein.

Additional exemplary activation domains include, but are not limited to,p300, CBP, PCAF, SRC1 PvALF, AtHD2A and ERF-2. See, for example, Robyret al. (2000) Mol. Endocrinol. 14:329-347; Collingwood et al. (1999) J.Mol. Endocrinol. 23:255-275; Leo et al. (2000) Gene 245: 1-11;Manteuffel-Cymborowska (1999) Acta Biochim. Pol. 46:77-89; McKenna etal. (1999) J. Steroid Biochem. Mol. Biol. 69:3-12; Malik et al. (2000)Trends Biochem. Sci. 25:277-283; and Lemon et al. (1999) Curr. Opin.Genet. Dev. 9:499-504. Additional exemplary activation domains include,but are not limited to, OsGAI, HALF-1, C1, AP1, ARF-5, -6, -7, and -8,CPRF1, CPRF4, MYC-RP/GP, and TRAB1. See, for example, Ogawa et al.(2000) Gene 245:21-29; Okanami et al. (1996) Genes Cells 1:87-99; Goffet al. (1991) Genes Dev. 5:298-309; Cho et al. (1999) Plant Mol. Biol.40:419-429; Ulmason et al. (1999) Proc. Natl. Acad. Sci. USA96:5844-5849; Sprenger-Haussels et al. (2000) Plant J. 22:1-8; Gong etal. (1999) Plant Mol. Biol. 41:33-44; and Hobo et al. (1999) Proc. Natl.Acad. Sci. USA 96:15,348-15,353.

It will be clear to those of skill in the art that, in the formation ofa fusion protein (or a nucleic acid encoding same), either an activatoror a molecule that interacts with an activator is suitable as aregulatory domain. Essentially any molecule capable of recruiting anactivating complex and/or activating activity (such as, for example,histone acetylation) to the target gene is useful as an activatingdomain of a fusion molecule.

Chromatin remodeling proteins and components of chromatin remodelingcomplexes for use as regulatory domains in fusion molecules aredescribed, for example, in co-owned PCT application US01/40616, thedisclosure of which is hereby incorporated by reference in its entirety.

In a further embodiment, a DNA-binding domain (e.g., a zinc fingerdomain) is fused to a bifunctional domain (BFD). A bifunctional domainis a transcriptional regulatory domain whose activity depends uponinteraction of the BFD with a second molecule. The second molecule canbe any type of molecule capable of influencing the functional propertiesof the BFD including, but not limited to, a compound, a small molecule,a peptide, a protein, a polysaccharide or a nucleic acid. An exemplaryBFD is the ligand binding domain of the estrogen receptor (ER). In thepresence of estradiol, the ER ligand binding domain acts as atranscriptional activator; while, in the absence of estradiol and thepresence of tamoxifen or 4-hydroxy-tamoxifen, it acts as atranscriptional repressor. Another example of a BFD is the thyroidhormone receptor (TR) ligand binding domain which, in the absence ofligand, acts as a transcriptional repressor and in the presence ofthyroid hormone (T3), acts as a transcriptional activator. An additionalBFD is the glucocorticoid receptor (GR) ligand binding domain. In thepresence of dexamethasone, this domain acts as a transcriptionalactivator; while, in the presence of RU486, it acts as a transcriptionalrepressor. An additional exemplary BFD is the ligand binding domain ofthe retinoic acid receptor. In the presence of its ligandall-trans-retinoic acid, the retinoic acid receptor recruits a number ofco-activator complexes and activates transcription. In the absence ofligand, the retinoic acid receptor is not capable of recruitingtranscriptional co-activators. Additional BFDs are known to those ofskill in the art. See, for example, U.S. Pat. Nos. 5,834,266 and5,994,313 and PCT WO 99/10508.

In additional embodiments, a plurality of fusion molecules can be usedin the disclosed methods. For example, a plurality of localizationdomain/DNA-binding domain fusions can be used; and a plurality oflocalization domain/DNA-binding domain/regulatory domain fusions can beused.

For these and other applications, exogenous molecules can be formulatedwith a pharmaceutically acceptable carrier, as is known to those ofskill in the art. See, for example, Remington's Pharmaceutical Sciences,17^(th) ed., 1985; and co-owned WO 00/42219.

Polynucleotide and Polypeptide Delivery

The compositions described herein can be provided to the target cell invitro or in vivo. In addition, the compositions can be provided aspolypeptides, polynucleotides or combination thereof.

A. Delivery of Polynucleotides

In certain embodiments, the compositions are provided as one or morepolynucleotides. Further, as noted above, a localizationdomain-containing composition can be designed as a fusion between apolypeptide DNA-binding domain and a localization domain and can beencoded by a fusion nucleic acid. In both fusion and non-fusion cases,the nucleic acid can be cloned into intermediate vectors fortransformation into prokaryotic or eukaryotic cells for replicationand/or expression. Intermediate vectors for storage or manipulation ofthe nucleic acid or production of protein can be prokaryotic vectors,(e.g., plasmids), shuttle vectors, insect vectors, or viral vectors forexample. A nucleic acid encoding a localization domain or a localizationdomain fusion can also cloned into an expression vector, foradministration to a bacterial cell, fungal cell, protozoal cell, plantcell, or animal cell, preferably a mammalian cell, more preferably ahuman cell.

To obtain expression of a cloned nucleic acid, it is typically subclonedinto an expression vector that contains a promoter to directtranscription. Suitable bacterial and eukaryotic promoters are wellknown in the art and described, e.g., in Sambrook et al., supra; Ausubelet al., supra; and Kriegler, Gene Transfer and Expression: A LaboratoryManual (1990). Bacterial expression systems are available in, e.g., E.coli, Bacillus sp., and Salmonella. Palva et al. (1983) Gene 22:229-235.Kits for such expression systems are commercially available. Eukaryoticexpression systems for mammalian cells, yeast, and insect cells are wellknown in the art and are also commercially available, for example, fromInvitrogen, Carlsbad, Calif. and Clontech, Palo Alto, Calif.

The promoter used to direct expression of the nucleic acid of choicedepends on the particular application. For example, a strongconstitutive promoter is typically used for expression and purification.In contrast, when a dedifferentiation protein is to be used in vivo,either a constitutive or an inducible promoter is used, depending on theparticular use of the protein. In addition, a weak promoter can be used,such as HSV TK or a promoter having similar activity. The promotertypically can also include elements that are responsive totransactivation, e.g., hypoxia response elements, Gal4 responseelements, lac repressor response element, and small molecule controlsystems such as tet-regulated systems and the RU-486 system. See, e.g.,Gossen et al. (1992) Proc. Natl. Acad. Sci USA 89:5547-5551; Oligino etal.(1998) Gene Ther. 5:491-496; Wang et al. (1997) Gene Ther. 4:432-441;Neering et al. (1996) Blood 88:1147-1155; and Rendahl et al. (1998) Nat.Biotechnol. 16:757-761.

In addition to a promoter, an expression vector typically contains atranscription unit or expression cassette that contains additionalelements required for the expression of the nucleic acid in host cells,either prokaryotic or eukaryotic. A typical expression cassette thuscontains a promoter operably linked, e.g., to the nucleic acid sequence,and signals required, e.g., for efficient polyadenylation of thetranscript, transcriptional termination, ribosome binding, and/ortranslation termination. Additional elements of the cassette mayinclude, e.g., enhancers, and heterologous spliced intronic signals.

The particular expression vector used to transport the geneticinformation into the cell is selected with regard to the intended use ofthe encoded polypeptide, e.g., expression in plants, animals, bacteria,fungi, protozoa etc. Standard bacterial expression vectors includeplasmids such as pBR322, pBR322-based plasmids, pSKF, pET23D, andcommercially available fusion expression systems such as GST and LacZ.Epitope tags can also be added to recombinant proteins to provideconvenient methods of isolation, for monitoring expression, and formonitoring cellular and subcellular localization, e.g., hemagglutinin(HA), c-myc or FLAG.

Expression vectors containing regulatory elements from eukaryoticviruses are often used in eukaryotic expression vectors, e.g., SV40vectors, papilloma virus vectors, and vectors derived from Epstein-Barrvirus. Other exemplary eukaryotic vectors include pMSG, pAV009/A+,pMTO10/A+, pMAMneo-5, baculovirus pDSVE, and any other vector allowingexpression of proteins under the direction of the SV40 early promoter,SV40 late promoter, CMV promoter, metallothionein promoter, murinemammary tumor virus promoter, Rous sarcoma virus promoter, polyhedrinpromoter, or other promoters shown effective for expression ineukaryotic cells.

Some expression systems have markers for selection of stably transfectedcell lines such as thymidine kinase, hygromycin B phosphotransferase,and dihydrofolate reductase. High-yield expression systems are alsosuitable, such as baculovirus vectors in insect cells, with an insertednucleic acid sequence under the transcriptional control of thepolyhedrin promoter or any other strong baculovirus promoter.

Elements that are typically included in expression vectors also includea replicon that functions in E. coli (or in the prokaryotic host, ifother than E. coli), a selective marker, e.g., a gene encodingantibiotic resistance, to permit selection of bacteria that harborrecombinant plasmids, and unique restriction sites in nonessentialregions of the vector to allow insertion of recombinant sequences.

Standard transfection methods can be used to produce bacterial,mammalian, yeast, insect, or other cell lines that express largequantities of heterologous proteins, which can be purified, if desired,using standard techniques. See, e.g., Colley et al. (1989) J. Biol.Chem. 264:17619-17622; and Guide to Protein Purification, in Methods inEnzymology, vol. 182 (Deutscher, ed.) 1990. Transformation of eukaryoticand prokaryotic cells are performed according to standard techniques.See, e.g., Morrison (1977) J. Bacteriol. 132:349-351; Clark-Curtiss etal. (1983) in Methods in Enzymology 101:347-362 (Wu et al., eds).

Any procedure for introducing foreign nucleotide sequences into hostcells can be used. These include, but are not limited to, the use ofcalcium phosphate transfection, DEAE-dextran-mediated transfection,polybrene, protoplast fusion, electroporation, lipid-mediated delivery(e.g., liposomes), microinjection, particle bombardment, introduction ofnaked DNA, plasmid vectors, viral vectors (both episomal andintegrative) and any of the other well known methods for introducingcloned genomic DNA, cDNA, synthetic DNA or other foreign geneticmaterial into a host cell (see, e.g., Sambrook et al., supra). It isonly necessary that the particular genetic engineering procedure used becapable of successfully introducing at least one gene into the host cellcapable of expressing the protein of choice.

Conventional viral and non-viral based gene transfer methods can be usedto introduce nucleic acids into mammalian cells or target tissues. Suchmethods can be used to administer nucleic acids encoding fusionpolypeptides to cells in vitro. Preferably, nucleic acids areadministered for in vivo or ex vivo gene therapy uses. Non-viral vectordelivery systems include DNA plasmids, naked nucleic acid, and nucleicacid complexed with a delivery vehicle such as a liposome. Viral vectordelivery systems include DNA and RNA viruses, which have either episomalor integrated genomes after delivery to the cell. For reviews of genetherapy procedures, see, for example, Anderson (1992) Science256:808-813; Nabel et al. (1993) Trends Biotechnol. 11:211-217; Mitaniet al. (1993) Trends Biotechnol. 11: 162-166; Dillon (1993) TrendsBiotechnol. 11: 167-175; Miller (1992) Nature 357:455-460; Van Brunt(1988) Biotechnology 6(10):1149-1154; Vigne (1995) Restorative Neurologyand Neuroscience 8:35-36; Kremer et al. (1995) British Medical Bulletin51(1):31-44; Haddada et al., in Current Topics in Microbiology andImmunology, Doerfler and Böhm (eds), 1995; and Yu et al. (1994) GeneTherapy 1:13-26.

Methods of non-viral delivery of nucleic acids include lipofection,microinjection, ballistics, virosomes, liposomes, immunoliposomes,polycation or lipid:nucleic acid conjugates, naked DNA, artificialvirions, and agent-enhanced uptake of DNA. Lipofection is described in,e.g., U.S. Pat. Nos. 5,049,386; 4,946,787; and 4,897,355 and lipofectionreagents are sold commercially (e.g., Transfectam™ and Lipofectin™).Cationic and neutral lipids that are suitable for efficient lipofectionof polynucleotides include those of Felgner, WO 91/17424 and WO91/16024. Nucleic acid can be delivered to cells (in vitro or ex vivoadministration) or to target tissues (in vivo administration).

The preparation of lipid:nucleic acid complexes, including targetedliposomes such as immunolipid complexes, is well known to those of skillin the art. See, e.g., Crystal (1995) Science 270:404-410; Blaese et al.(1995) Cancer Gene Ther. 2:291-297; Behr et al. (1994) BioconjugateChem. 5:382-389; Remy et al. (1994) Bioconjugate Chem. 5:647-654; Gao etal. (1995) Gene Therapy 2:710-722; Ahmad et al. (1992) Cancer Res.52:4817-4820; and U.S. Pat. Nos. 4,186,183; 4,217,344; 4,235,871;4,261,975; 4,485,054; 4,501,728; 4,774,085; 4,837,028 and 4,946,787.

The use of RNA or DNA virus-based systems for the delivery of nucleicacids takes advantage of highly evolved processes for targeting a virusto specific cells in the body and trafficking the viral payload to thenucleus. Viral vectors can be administered directly to patients (invivo) or they can be used to treat cells in vitro, wherein the modifiedcells are administered to patients (ex vivo). Conventional viral basedsystems for the delivery of nucleic acids include retroviral,lentiviral, poxviral, adenoviral, adeno-associated viral, vesicularstomatitis viral and herpesviral vectors. Integration in the host genomeis possible with certain viral vectors, including the retrovirus,lentivirus, and adeno-associated virus gene transfer methods, oftenresulting in long term expression of the inserted transgene.Additionally, high transduction efficiencies have been observed in manydifferent cell types and target tissues.

The tropism of a retrovirus can be altered by incorporating foreignenvelope proteins, allowing alteration and/or expansion of the potentialtarget cell population. Lentiviral vectors are retroviral vector thatare able to transduce or infect non-dividing cells and typically producehigh viral titers. Selection of a retroviral gene transfer system wouldtherefore depend on the target tissue. Retroviral vectors have apackaging capacity of up to 6-10 kb of foreign sequence and arecomprised of cis-acting long terminal repeats (LTRs). The minimumcis-acting LTRs are sufficient for replication and packaging of thevectors, which are then used to integrate the therapeutic gene into thetarget cell to provide permanent transgene expression. Widely usedretroviral vectors include those based upon murine leukemia virus(MuLV), gibbon ape leukemia virus (GaLV), simian immunodeficiency virus(SIV), human immunodeficiency virus (HIV), and combinations thereof.Buchscher et al. (1992) J. Virol. 66:2731-2739; Johann et al. (1992) J.Virol. 66:1635-1640; Sommerfelt et al. (1990) Virol. 176:58-59; Wilsonet al. (1989) J. Virol. 63:2374-2378; Miller et al. (1991) J. Virol.65:2220-2224; and PCT/US94/05700).

Adeno-associated virus (AAV) vectors are also used to transduce cellswith target nucleic acids, e.g., in the in vitro production of nucleicacids and peptides, and for in vivo and ex vivo gene therapy procedures.See, e.g., West et al. (1987) Virology 160:38-47; U.S. Pat. No.4,797,368; WO 93/24641; Kotin (1994) Hum. Gene Ther. 5:793-801; andMuzyczka (1994) J. Clin. Invest. 94:1351. Construction of recombinantAAV vectors are described in a number of publications, including U.S.Pat. No. 5,173,414; Tratschin et al. (1985) Mol. Cell. Biol.5:3251-3260; Tratschin, et al. (1984) Mol. Cell. Biol. 4:2072-2081;Hermonat et al. (1984) Proc. Natl. Acad. Sci. USA 81:6466-6470; andSamulski et al. (1989) J. Virol. 63:3822-3828.

Recombinant adeno-associated virus vectors based on the defective andnonpathogenic parvovirus adeno-associated virus type 2 (AAV-2) are apromising gene delivery system. Exemplary AAV vectors are derived from aplasmid containing the AAV 145 bp inverted terminal repeats flanking atransgene expression cassette. Efficient gene transfer and stabletransgene delivery due to integration into the genomes of the transducedcell are key features for this vector system. Wagner et al. (1998)Lancet 351

(9117):1702-3; and Kearns et al. (1996) Gene Ther. 9:748-55.

pLASN and MFG-S are examples are retroviral vectors that have been usedin clinical trials. Dunbar et al. (1995) Blood 85:3048-305; Kohn et al.(1995) Nature Med. 1:1017-102; Malech et al. (1997) Proc. Natl. Acad.Sci. USA 94:12133-12138. PA317/pLASN was the first therapeutic vectorused in a gene therapy trial. (Blaese et al. (1995) Science 270:475-480.Transduction efficiencies of 50% or greater have been observed for MFG-Spackaged vectors. Ellem et al. (1997) Immunol Immunother. 44(1):10-20;Dranoff et al. (1997) Hum. Gene Ther. 1:111-2.

In applications for which transient expression is preferred,adenoviral-based systems are useful. Adenoviral based vectors arecapable of very high transduction efficiency in many cell types and arecapable of infecting, and hence delivering nucleic acid to, bothdividing and non-dividing cells. With such vectors, high titers andlevels of expression have been obtained. Adenovirus vectors can beproduced in large quantities in a relatively simple system.

Replication-deficient recombinant adenoviral (Ad) can be produced athigh titer and they readily infect a number of different cell types.Most adenovirus vectors are engineered such that a transgene replacesthe Ad E1a, E1b, and/or E3 genes; the replication defector vector ispropagated in human 293 cells that supply the required E1 functions intrans. Ad vectors can transduce multiple types of tissues in vivo,including non-dividing, differentiated cells such as those found in theliver, kidney and muscle. Conventional Ad vectors have a large carryingcapacity for inserted DNA. An example of the use of an Ad vector in aclinical trial involved polynucleotide therapy for antitumorimmunization with intramuscular injection. Sterman et al. (1998) Hum.Gene Ther. 7:1083-1089. Additional examples of the use of adenovirusvectors for gene transfer in clinical trials include Rosenecker et al.(1996) Infection 24:5-10; Sterman et al., supra; Welsh et al. (1995)Hum. Gene Ther. 2:205-218; Alvarez et al. (1997) Hum. Gene Ther.5:597-613; and Topf et al. (1998) Gene Ther. 5:507-513.

Packaging cells are used to form virus particles that are capable ofinfecting a host cell. Such cells include 293 cells, which packageadenovirus, and T2 cells or PA317 cells, which package retroviruses.Viral vectors used in gene therapy are usually generated by a producercell line that packages a nucleic acid vector into a viral particle. Thevectors typically contain the minimal viral sequences required forpackaging and subsequent integration into a host, other viral sequencesbeing replaced by an expression cassette for the protein to beexpressed. Missing viral functions are supplied in trans, if necessary,by the packaging cell line. For example, AAV vectors used in genetherapy typically only possess ITR sequences from the AAV genome, whichare required for packaging and integration into the host genome. ViralDNA is packaged in a cell line, which contains a helper plasmid encodingthe other AAV genes, namely rep and cap, but lacking ITR sequences. Thecell line is also infected with adenovirus as a helper. The helper viruspromotes replication of the AAV vector and expression of AAV genes fromthe helper plasmid. The helper plasmid is not packaged in significantamounts due to a lack of ITR sequences. Contamination with adenoviruscan be reduced by, e.g., heat treatment, which preferentiallyinactivates adenoviruses.

In many gene therapy applications, it is desirable that the gene therapyvector be delivered with a high degree of specificity to a particulartissue type. A viral vector can be modified to have specificity for agiven cell type by expressing a ligand as a fusion protein with a viralcoat protein on the outer surface of the virus. The ligand is chosen tohave affinity for a receptor known to be present on the cell type ofinterest. For example, Han et al. (1995) Proc. Natl. Acad. Sci. USA92:9747-9751 reported that Moloney murine leukemia virus can be modifiedto express human heregulin fused to gp70, and the recombinant virusinfects certain human breast cancer cells expressing human epidermalgrowth factor receptor. This principle can be extended to other pairs ofvirus expressing a ligand fusion protein and target cell expressing areceptor. For example, filamentous phage can be engineered to displayantibody fragments (e.g., F_(ab) or F_(v)) having specific bindingaffinity for virtually any chosen cellular receptor. Although the abovedescription applies primarily to viral vectors, the same principles canbe applied to non-viral vectors. Such vectors can be engineered tocontain specific uptake sequences thought to favor uptake by specifictarget cells.

Gene therapy vectors can be delivered in vivo by administration to anindividual patient, typically by systemic administration (e.g.,intravenous, intraperitoneal, intramuscular, subdermal, or intracranialinfusion) or topical application, as described infra. Alternatively,vectors can be delivered to cells ex vivo, such as cells explanted froman individual patient (e.g., lymphocytes, bone marrow aspirates, tissuebiopsy) or universal donor hematopoietic stem cells, followed byreimplantation of the cells into a patient, usually after selection forcells which have incorporated the vector.

Ex vivo cell transfection for diagnostics, research, or for gene therapy(e.g., via re-infusion of the transfected cells into the host organism)is well known to those of skill in the art. In a preferred embodiment,cells are isolated from the subject organism, transfected with a nucleicacid (gene or cDNA), and re-infused back into the subject organism(e.g., patient). Various cell types suitable for ex vivo transfectionare well known to those of skill in the art. See, e.g., Freshney et al.,Culture of Animal Cells, A Manual of Basic Technique, 3rd ed., 1994, andreferences cited therein, for a discussion of isolation and culture ofcells from patients.

In one embodiment, hematopoietic stem cells are used in ex vivoprocedures for cell transfection and gene therapy. The advantage tousing stem cells is that they can be differentiated into other celltypes in vitro, or can be introduced into a mammal (such as the donor ofthe cells) where they will engraft in the bone marrow. Methods fordifferentiating CD34+ stem cells in vitro into clinically importantimmune cell types using cytokines such a GM-CSF, IFN-γ and TNF-α areknown. Inaba et al. (1992) J. Exp. Med. 176:1693-1702.

Stem cells are isolated for transduction and differentiation using knownmethods. For example, stem cells are isolated from bone marrow cells bypanning the bone marrow cells with antibodies which bind unwanted cells,such as CD4+ and CD8+ (T cells), CD45+ (panb cells), GR-1(granulocytes), and lad (differentiated antigen presenting cells). SeeInaba et al., supra.

Vectors (e.g., retroviruses, adenoviruses, liposomes, etc.) containingtherapeutic nucleic acids can be also administered directly to theorganism for transduction of cells in vivo. Alternatively, naked DNA canbe administered. Administration is by any of the routes normally usedfor introducing a molecule into ultimate contact with blood or tissuecells. Suitable methods of administering such nucleic acids areavailable and well known to those of skill in the art, and, althoughmore than one route can be used to administer a particular composition,a particular route can often provide a more immediate and more effectivereaction than another route.

Pharmaceutically acceptable carriers are determined in part by theparticular composition being administered, as well as by the particularmethod used to administer the composition. Accordingly, there is a widevariety of suitable formulations of pharmaceutical compositions, asdescribed below. See, e.g., Remington's Pharmaceutical Sciences, 17thed., 1989.

B. Delivery of Polypeptides

In other embodiments, for example in certain in vitro situations, targetcells are cultured in a medium containing localization domain fusionpolypeptides or functional fragments thereof.

An important factor in the administration of polypeptide compounds isensuring that the polypeptide has the ability to traverse the plasmamembrane of a cell, or the membrane of an intracellular compartment suchas the nucleus. Cellular membranes are composed of lipid-proteinbilayers that are freely permeable to small, nonionic lipophiliccompounds and are inherently impermeable to polar compounds,macromolecules, and therapeutic or diagnostic agents. However, proteins,lipids and other compounds, which have the ability to translocatepolypeptides across a cell membrane, have been described.

For example, “membrane translocation polypeptides” have amphiphilic orhydrophobic amino acid subsequences that have the ability to act asmembrane-translocating carriers. In one embodiment, homeodomain proteinshave the ability to translocate across cell membranes. The shortestinternalizable peptide of a homeodomain protein, Antennapedia, was foundto be the third helix of the protein, from amino acid position 43 to 58.Prochiantz (1996) Curr. Opin. Neurobiol. 6:629-634. Another subsequence,the h (hydrophobic) domain of signal peptides, was found to have similarcell membrane translocation characteristics. Lin et al. (1995) J. Biol.Chem. 270:14255-14258.

Examples of peptide sequences which can be linked to a polypeptide forfacilitating its uptake into cells include, but are not limited to: an11 amino acid peptide of the tat protein of HIV; a 20 residue peptidesequence which corresponds to amino acids 84-103 of the p16 protein (seeFahraeus et al. (1996) Curr. Biol. 6:84); the third helix of the60-amino acid long homeodomain of Antennapedia (Derossi et al. (1994) J.Biol. Chem. 269:10444); the h region of a signal peptide, such as theKaposi fibroblast growth factor (K-FGF) h region (Lin et al., supra);and the VP22 translocation domain from HSV (Elliot et al. (1997) Cell88:223-233). Other suitable chemical moieties that provide enhancedcellular uptake can also be linked, either covalently or non-covalently,to the fusion polypeptides disclosed herein.

Toxin molecules also have the ability to transport polypeptides acrosscell membranes. Often, such molecules (called “binary toxins”) arecomposed of at least two parts: a translocation or binding domain and aseparate toxin domain. Typically, the translocation domain, which canoptionally be a polypeptide, binds to a cellular receptor, facilitatingtransport of the toxin into the cell. Several bacterial toxins,including Clostridium perfringens iota toxin, diphtheria toxin (DT),Pseudomonas exotoxin A (PE), pertussis toxin (PT), Bacillus anthracistoxin, and pertussis adenylate cyclase (CYA), have been used to deliverpeptides to the cell cytosol as internal or amino-terminal fusions.Arora et al. (1993) J. Biol. Chem. 268:3334-3341; Perelle et al. (1993)Infect. Immun. 61:5147-5156; Stenmark et al. (1991) J. Cell Biol.113:1025-1032; Donnelly et al. (1993) Proc. Natl. Acad. Sci. USA90:3530-3534; Carbonetti et al. (1995) Abstr. Annu. Meet. Am. Soc.Microbiol. 95:295; Sebo et al. (1995) Infect. Immun. 63:3851-3857;Klimpel et al. (1992) Proc. Natl. Acad. Sci. USA. 89:10277-10281; andNovak et al. (1992) J. Biol. Chem. 267:17186-17193.

Such subsequences can be used to translocate polypeptides, including thefusion polypeptides disclosed herein, across a cell membrane. This isaccomplished, for example, by derivatizing the fusion polypeptide withone of these translocation sequences, or by forming an additional fusionof the translocation sequence with the fusion polypeptide. Optionally, alinker can be used to link the fusion polypeptide and the translocationsequence. Any suitable linker can be used, e.g., a peptide linker.

A suitable polypeptide can also be introduced into an animal cell,preferably a mammalian cell, via liposomes and liposome derivatives suchas immunoliposomes. The term “liposome” refers to vesicles comprised ofone or more concentrically ordered lipid bilayers, which encapsulate anaqueous phase. The aqueous phase typically contains the compound to bedelivered to the cell.

The liposome fuses with the plasma membrane, thereby releasing thecompound into the cytosol. Alternatively, the liposome is phagocytosedor taken up by the cell in a transport vesicle. Once in the endosome orphagosome, the liposome is either degraded or it fuses with the membraneof the transport vesicle and releases its contents.

In current methods of drug delivery via liposomes, the liposomeultimately becomes permeable and releases the encapsulated compound atthe target tissue or cell. For systemic or tissue specific delivery,this can be accomplished, for example, in a passive manner wherein theliposome bilayer is degraded over time through the action of variousagents in the body. Alternatively, active drug release involves using anagent to induce a permeability change in the liposome vesicle. Liposomemembranes can be constructed so that they become destabilized when theenvironment becomes acidic near the liposome membrane. See, e.g., Proc.Natl. Acad. Sci. USA 84:7851 (1987); Biochemistry 28:908 (1989). Whenliposomes are endocytosed by a target cell, for example, they becomedestabilized and release their contents. This destabilization is termedfusogenesis. Dioleoylphosphatidylethanolamine (DOPE) is the basis ofmany “fusogenic” systems.

For use with the methods and compositions disclosed herein, liposomestypically comprise a fusion polypeptide as disclosed herein, a lipidcomponent, e.g., a neutral and/or cationic lipid, and optionally includea receptor-recognition molecule such as an antibody that binds to apredetermined cell surface receptor or ligand (e.g., an antigen). Avariety of methods are available for preparing liposomes as describedin, e.g.; U.S. Pat. Nos. 4,186,183; 4,217,344; 4,235,871; 4,261,975;4,485,054; 4,501,728; 4,774,085; 4,837,028; 4,235,871; 4,261,975;4,485,054; 4,501,728; 4,774,085; 4,837,028; 4,946,787; PCT PublicationNo. WO 91/17424; Szoka et al. (1980) Ann. Rev. Biophys. Bioeng. 9:467;Deamer et al. (1976) Biochim. Biophys. Acta 443:629-634; Fraley, et al.(1979) Proc. Natl. Acad. Sci. USA 76:3348-3352; Hope et al. (1985)Biochim. Biophys. Acta 812:55-65; Mayer et al. (1986) Biochim. Biophys.Acta 858:161-168; Williams et al. (1988) Proc. Natl. Acad. Sci. USA85:242-246; Liposomes, Ostro (ed.), 1983, Chapter 1); Hope et al. (1986)Chem. Phys. Lip. 40:89; Gregoriadis, Liposome Technology (1984) andLasic, Liposomes: from Physics to Applications (1993). Suitable methodsinclude, for example, sonication, extrusion, highpressure/homogenization, microfluidization, detergent dialysis,calcium-induced fusion of small liposome vesicles and ether-fusionmethods, all of which are well known in the art.

In certain embodiments, it may be desirable to target a liposome usingtargeting moieties that are specific to a particular cell type, tissue,and the like. Targeting of liposomes using a variety of targetingmoieties (e.g., ligands, receptors, and monoclonal antibodies) has beenpreviously described. See, e.g., U.S. Pat. Nos. 4,957,773 and 4,603,044.

Examples of targeting moieties include monoclonal antibodies specific toantigens associated with neoplasms, such as prostate cancer specificantigen and MAGE. Tumors can also be diagnosed by detecting geneproducts resulting from the activation or over-expression of oncogenes,such as ras or c-erbB2. In addition, many tumors express antigensnormally expressed by fetal tissue, such as the alphafetoprotein (AFP)and carcinoembryonic antigen (CEA). Sites of viral infection can bediagnosed using various viral antigens such as hepatitis B core andsurface antigens (HBVc, HBVs) hepatitis C antigens, Epstein-Barr virusantigens, human immunodeficiency type-1 virus (HIV-1) and papillomavirus antigens. Inflammation can be detected using moleculesspecifically recognized by surface molecules which are expressed atsites of inflammation such as integrins (e.g., VCAM-1), selectinreceptors (e.g., ELAM-1) and the like.

Standard methods for coupling targeting agents to liposomes are used.These methods generally involve the incorporation into liposomes oflipid components, e.g., phosphatidylethanolamine, which can be activatedfor attachment of targeting agents, or incorporation of derivatizedlipophilic compounds, such as lipid derivatized bleomycin. Antibodytargeted liposomes can be constructed using, for instance, liposomeswhich incorporate protein A. See Renneisen et al. (1990) J. Biol. Chem.265:16337-16342 and Leonetti et al. (1990) Proc. Natl. Acad. Sci. USA87:2448-2451.

Pharmaceutical Compositions and Administration

Fusion molecules as disclosed herein, and expression vectors encodingthese polypeptides, can be used in conjunction with various methods ofgene therapy to facilitate the action of a therapeutic gene product. Insuch applications, the fusion molecule can be administered directly to apatient, e.g., to facilitate the modulation of gene expression and fortherapeutic or prophylactic applications, for example, cancer, ischemia,diabetic retinopathy, macular degeneration, rheumatoid arthritis,psoriasis, HIV infection, sickle cell anemia, Alzheimer's disease,muscular dystrophy, neurodegenerative diseases, vascular disease, cysticfibrosis, stroke, and the like. Examples of microorganisms whoseinhibition can be facilitated through use of the methods andcompositions disclosed herein include pathogenic bacteria, e.g.,Chlamydia, Rickettsial bacteria, Mycobacteria, Staphylococci,Streptococci, Pneumococci, Meningococci and Conococci, Klebsiella,Proteus, Serratia, Pseudomonas, Legionella, Diphtheria, Salmonella,Bacilli (e.g., anthrax), Vibrio (e.g., cholera), Clostridium (e.g.,tetanus, botulism), Yersinia (e.g., plague), Leptospirosis, andBorrellia (e.g., Lyme disease bacteria); infectious fungus, e.g.,Aspergillus, Candida species; protozoa such as sporozoa (e.g.,Plasmodia), rhizopods (e.g., Entamoeba) and flagellates (Trypanosoma,Leishmania, Trichomonas, Giardia, etc.); viruses, e.g., hepatitis (A, B,or C), herpes viruses (e.g., VZV, HSV-1, HHV-6, HSV-II, CMV, and EBV),HIV, Ebola, Marburg and related hemorrhagic fever-causing viruses,adenoviruses, influenza viruses, flaviviruses, echoviruses,rhinoviruses, coxsackie viruses, comaviruses, respiratory syncytialviruses, mumps viruses, rotaviruses, measles viruses, rubella viruses,parvoviruses, vaccinia viruses, HTLV viruses, retroviruses,lentiviruses, dengue viruses, papillomaviruses, polioviruses, rabiesviruses, and arboviral encephalitis viruses, etc.

Administration of therapeutically effective amounts of a localizationdomain-DNA-binding domain fusion molecule, a localizationdomain-DNA-binding domain-regulatory doomain fusion or a nucleic acidencoding these fusion polypeptides is by any of the routes normally usedfor introducing polypeptides or nucleic acids into ultimate contact withthe tissue to be treated. The polypeptides or nucleic acids areadministered in any suitable manner, preferably in a pharmaceuticallyacceptable carrier. Suitable methods of administering such modulatorsare available and well known to those of skill in the art, and, althoughmore than one route can be used to administer a particular composition,a particular route can often provide a more immediate and more effectivereaction than another route.

Pharmaceutically acceptable carriers are determined in part by theparticular composition being administered, as well as by the particularmethod used to administer the composition. Accordingly, there is a widevariety of suitable formulations of pharmaceutical compositions. See,e.g., Remington's Pharmaceutical Sciences, 17^(th) ed. 1985.

Fusion polypeptides or nucleic acids, alone or in combination with othersuitable components, can be made into aerosol formulations (i.e., theycan be “nebulized”) to be administered via inhalation. Aerosolformulations can be placed into pressurized acceptable propellants, suchas dichlorodifluoromethane, propane, nitrogen, and the like.

Formulations suitable for parenteral administration, such as, forexample, by intravenous, intramuscular, intradermal, intracardiac andsubcutaneous routes, include aqueous and non-aqueous, isotonic sterileinjection solutions, which can contain antioxidants, buffers,bacteriostats, and solutes that render the formulation isotonic with theblood of the intended recipient, and aqueous and non-aqueous sterilesuspensions that can include suspending agents, solubilizers, thickeningagents, stabilizers, and preservatives. Compositions can beadministered, for example, by intravenous infusion, orally, topically,intraperitoneally, intrapluerally, intravesically or intrathecally. Theformulations of compounds can be presented in unit-dose or multi-dosesealed containers, such as ampoules and vials. Injection solutions andsuspensions can be prepared from sterile powders, granules, and tabletsof the kind known to those of skill in the art.

Applications

The compositions and methods disclosed herein can be used to modulate anumber of cellular processes. These include, but are not limited to,transcription, replication, recombination, repair, integration,maintenance of telomeres, and processes involved in chromosome stabilityand disjunction. Accordingly, the methods and compositions disclosedherein can be used to affect any of these processes, as well as anyother process which are influenced by localization domain fusionmolecules and their effects on gene expression, intranuclearlocalization and chromatin structure.

In preferred embodiments, a localization domain/DNA-binding domainfusion is used to achieve targeted repression of gene expression.Targeting is based upon the specificity of the DNA-binding domain. Inanother embodiment, a localization domain/DNA-bindingdomain/transcriptional activation domain fusion is used to achievereactivation of a developmentally-silenced gene. In additionalembodiments a localization domain/DNA-binding domain/chromatinremodeling complex component fusion is used to remodel the chromatinstructure of a repressed gene located in a heterochromatic nuclearcompartment, to allow access of transcriptional activators, eitherendogenous or exogenous. In these embodiments, additional molecules,exogenous and/or endogenous, can be used to facilitate repression oractivation of gene expression. The additional molecules can also befusion molecules, for example, fusions between a DNA-binding domain anda functional domain such as an activation or repression domain or acomponent of a chromatin remodeling complex.

Accordingly, expression of any gene in any organism can be modulatedusing the methods and compositions disclosed herein, includingtherapeutically relevant genes, genes of infecting microorganisms, viralgenes, and genes whose expression is modulated in the process of targetvalidation. Such genes include, but are not limited to, vascularendothelial growth factor (VEGF), VEGF receptors flt and flk, CCR-5, lowdensity lipoprotein receptor (LDLR), estrogen receptor, HER-2/neu,BRCA-1, BRCA-2, phosphoenolpyruvate carboxykinase (PEPCK), CYP7,fibrinogen, apolipoprotein A (ApoA), apolipoprotein B (ApoB), renin,phosphoenolpyruvate carboxykinase (PEPCK), CYP7, fibrinogen, nuclearfactor κB (NF-κB), inhibitor of NF-κB (I-κB), tumor necrosis factors(e.g., TNF-α, TNF-β), interleukin-1 (IL-1), FAS (CD95), FAS ligand(CD95L), atrial natriuretic factor, platelet-derived factor (PDF),amyloid precursor protein (APP), tyrosinase, tyrosine hydroxylase,β-aspartyl hydroxylase, alkaline phosphatase, calpains (e.g., CAPN10)neuronal pentraxin receptor, adriamycin response protein, apolipoproteinE (apoE), leptin, leptin receptor, UCP-1, IL-1, IL-1 receptor, IL-2,IL-3, IL-4, IL-5, IL-6, IL-12, IL-15, interleukin receptors, G-CSF,GM-CSF, colony stimulating factor, erythropoietin (EPO),platelet-derived growth factor (PDGF), PDGF receptor, fibroblast growthfactor (FGF), FGF receptor, PAF, p16, p19, p53, Rb, p21, myc, myb,globin, dystrophin, eutrophin, cystic fibrosis transmembrane conductanceregulator (CFTR), GNDF, nerve growth factor (NGF), NGF receptor,epidermal growth factor (EGF), EGF receptor, transforming growth factors(e.g., TGF-α, TGF-β), fibroblast growth factor (FGF), interferons (e.g.,IFN-α, IFN-β and IFN-γ), insulin-related growth factor-1 (IGF-1),angiostatin, ICAM-1, signal transducer and activator of transcription(STAT), androgen receptors, e-cadherin, cathepsins (e.g., cathepsin W),topoisomerase, telomerase, bcl, bcl-2, Bax, T Cell-specific tyrosinekinase (Lck), p38 mitogen-activated protein kinase, protein tyrosinephosphatase (hPTP), adenylate cyclase, guanylate cyclase, α7 neuronalnicotinic acetylcholine receptor, 5-hydroxytryptamine (serotonin)-2Areceptor, transcription elongation factor-3 (TEF-3), phosphatidylcholinetransferase, ftz, PTI-1, polygalacturonase, EPSP synthase, FAD2-1, Δ-9desaturase, Δ-12 desaturase, Δ-15 desaturase, acetyl-Coenzyme Acarboxylase, acyl-ACP thioesterase, ADP-glucose pyrophosphorylase,starch synthase, cellulose synthase, sucrose synthase, fatty acidhydroperoxide lyase, and peroxisome proliferator-activated receptors,such as PPAR-γ2. See also Science 291:1177-1351 (2001) and Nature409:813-958 (2001).

Expression of human, mammalian, bacterial, fungal, protozoal, Archaeal,plant and viral genes can be modulated. Viral genes include, but are notlimited to, hepatitis virus genes such as, for example, HBV-C, HBV-S,HBV-X and HBV-P; and HIV genes such as, for example, tat and rev.Modulation of expression of genes encoding antigens of a pathogenicorganism can be achieved using the disclosed methods and compositions.

Additional genes include those encoding cytokines, lymphokines,interleukins, growth factors, mitogenic factors, apoptotic factors,cytochromes, chemotactic factors, chemokine receptors (e.g., CCR-2,CCR-3, CCR-5, CXCR-4), phospholipases (e.g., phospholipase C), nuclearreceptors, retinoid receptors, organellar receptors, hormones, hormonereceptors, oncogenes, tumor suppressors, cyclins, cell cycle checkpointproteins (e.g., Chk1, Chk2), senescence-associated genes,immunoglobulins, genes encoding heavy metal chelators, protein tyrosinekinases, protein tyrosine phosphatases, tumor necrosis factorreceptor-associated factors (e.g., Traf-3, Traf-6), apolipoproteins,thrombic factors, vasoactive factors, neuroreceptors, cell surfacereceptors, G-proteins, G-protein-coupled receptors (e.g., substance Kreceptor, angiotensin receptor, α- and β-adrenergic receptors, serotoninreceptors, and PAF receptor), muscarinic receptors, acetylcholinereceptors, GABA receptors, glutamate receptors, dopamine receptors,adhesion proteins (e.g., CAMs, selectins, integrins and immunoglobulinsuperfamily members), ion channels, receptor-associated factors,hematopoietic factors, transcription factors, and molecules involved insignal transduction. Expression of disease-related genes, and/or of oneor more genes specific to a particular tissue or cell type such as, forexample, brain, muscle, heart, nervous system, circulatory system,reproductive system, genitourinary system, digestive system andrespiratory system can also be modulated.

Thus, the methods and compositions disclosed herein can be used inprocesses such as, for example, therapeutic regulation ofdisease-related genes, engineering of cells for manufacture of proteinpharmaceuticals, pharmaceutical discovery (including target discovery,target validation and engineering of cells for high throughput screeningmethods) and plant agriculture.

EXAMPLES

The following examples are presented as illustrative of, but notlimiting, the claimed subject matter.

Example 1 Materials and Methods

Cloning of dMBD-Like and dMBD-LikeΔ

dMBD-likeΔ was obtained as an EST clone (LD03777, 23) and the codingsequence amplified and cloned using a T/A cloning kit (Invitrogen)according to the manufacturer's directions. DNA sequencing confirmed thefidelity of amplification and the coding sequence was then subclonedinto pET-21a(+) (Novagen) using the NheI-XhoI sites.

dMBD-like expression clones were prepared by RT-PCR from totalDrosophila RNA using the following primer pair: (SEQ ID NO: 2) MBDf:5′-GGAATTGGGAATTGCGCTAGCATGAACCCGAGCGTCACAATC-3′; (SEQ ID NO: 3) MBDr:5′-GCGAATTCTGTCTTGAGTGCATCCTGCAGCTTTCGCGCAACTCCG- 3′.PCR products were isolated on agarose gels, excised, reamplified andcloned into the EcoRI-NheI sites of pTYB1 (NEB). Fidelity of reversetranscription and amplification was verified by DNA sequencing.Purification of Recombinant Protein

Recombinant dMBD-likeΔ was expressed in E. coli BL21 (DE3). 500 ml of LBwere inoculated with 5 ml of an overnight culture and incubated at 37°C. to A₆₀₀ of 0.7. Induction was performed by addition of isopropylβ-thiogalactosidase to 1 mM and incubation at 37° C. for 3 additionalhours. Cells were harvested and resuspended in 10 ml of sonicationbuffer (20 mM Tris-HCl pH 8.0, 500 mM NaCl, 5 mM imidazole, 0.1%Nonidet-P-40 (NP-40), 1 mM 2-mercaptoethanol). Purification of thesoluble His-tagged protein was performed with TALON resin (Clontech)according to the manufacturer's protocol. Protein was dialyzed versus 20mM Tris-HCl, pH 8.0, 500 mM NaCl, 1 mM 2-mercaptoethanol, 2 mM MgCl₂.Quantitation was performed using the BioRad protein assay. RecombinantdMBD-like was prepared using the Impact CN System (New England Biolabs)according to the manufacturer's protocol.

Gel Mobility-Shift and Southwestern Assays

Gel mobility shifts were performed in 10% polyacrylamide gels run in0.5× TBE buffer (45 mM Tris, pH 8.0, 45 mM boric acid, 1 mM EDTA) usingGAC12 or GAM12 double-stranded oligonucleotide probes, essentially asdescribed in Wade et al. (1999) Nature Genet. 23:62-66. One picomole ofradiolabelled probe was mixed with purified recombinant protein asindicated in the figure legends in 10 mM Tris HCl pH 8.0, 3 mM MgCl₂, 50mM NaCl, 0.1 mM EDTA, 0.1% NP-40, 2 mM DTT, 5% glycerol and 0.4 mg/mlBSA. The samples were incubated for 30 min at 37° C. 30 pmoles ofcompetitor DNA (GAC12 or GAM12) were used per binding reaction. Gelswere scanned on a PhosphorImager (Molecular Dynamics). The procedureused for southwestern assays is as described in Wade et al. (1999),supra.

Antibodies, Immunoblots and Immunoprecipitations

Protein samples were resolved by SDS-PAGE and transferred to Immobilon-Pmembrane (Millipore) following the manufacturer's recommendations. TheDrosophila MBD-like antibodies were elicited in rabbits by subcutaneousinjection of recombinant dMBD-likeΔ by Covance Laboratories, Inc. Forimmunoprecipitations, antibodies were immobilized on Protein A beads(Pierce) and subsequently incubated with 100 μg nuclear extract, or 100μg of the BioRex 0.5 M fraction for 2 h at 4° C. with rotation. Thebeads were washed three times with Buffer C (0.1 M NaCl, see below) andanalyzed by histone deacetylase or ATPase assays. NP-40 (0.01%) wasincluded in all buffers.

Histone Deacetylase and ATPase Assays

Chicken histone octamers (20.25 nmoles) were acetylated usingrecombinant yeast HAT1p and ³H-acetyl CoA (5.3 nm) followed by a coldAcetyl-CoA (100 nm) chase for complete acetylation. Deacetylation of thesamples was carried out in a reaction (200 μl) containing 25 mM Tris pH8.0, 50 mM NaCl, 1 mM EDTA, 10% glycerol, and 1 μg ³H-histone octamersubstrate. Reactions were incubated for 30 min at 30° C., wereterminated by adding 50 μl stop solution (0.1 M HCl and 0.16 M HAc), andextracted with 600 μl ethyl acetate. 450 μl of the organic layer wascounted in a liquid scintillation counter. Released acetate is indicatedin the figures as cpm.

In an ATPase assay, samples were incubated with γ-^([32P])ATP in theabsence or presence of chicken erythrocyte mononucleosomes for 30 min atroom temperature. Reactions were spotted on PEI-cellulose thin-layerchromatography plates and developed in 1 M formic acid and 0.5 M LiCl.ATP hydrolysis was quantitated using a PhosphorImager (MolecularDynamics) with Image Quant Software.

Fractionation of dMBD-Like-Containing Complexes

S2 cells were grown in suspension culture in Grace's Insect mediasupplemented with 10% heat-treated fetal bovine serum (FBS). Cells werecentrifuged at 5,000 rpm for 10 min. The pellet was resuspended in 40 mlof Buffer A (10 mM Hepes pH 7.5, 15 mM KCl, 2 mM MgCl₂, 0.1 mM EDTA, 1mM DTT, 0.5 mM PMSF) and 2.7 ml of Buffer B (50 mM Hepes pH 7.5, 1 MKCl, 30 mM MgCl₂, 0.1 mM EDTA, 1 mM DTT, 0.5 mM PMSF) and centrifuged at5,000 rpm for 10 min. The pellet was resuspended in 10 ml of Buffer Aand re-centrifuged as above. Cells were resuspended in 20 ml of Buffer Aand homogenized with a Dounce homogenizer. 1.5 ml of Buffer B were addedand the mixture was again homogenized for a few additional strokes. Thehomogenate was centrifuged at 8,000 rpm for 8 min. Nuclei wereresuspended in 20 ml of buffer A and homogenized with 6-8 strokes. 2 mlof 4 M (NH₄)₂SO₄ were added. The suspension was rotated for 30 min andcentrifuged at 10,000 rpm for 30 min. The supernatant was dialyzedversus Buffer C (100 mM KCl, 20 mM Hepes pH 7.5, 1 mM EGTA, 1.5 mMMgCl₂, 1 mM PMSF, 0.5 mM DTT, 10% glycerol) and cleared in a SW50.1rotor at 40,000 rpm for 60 minutes.

The dialyzed extract was loaded onto a BioRex70 (Na⁺) (BioRad) columnequilibrated with Buffer C at 10 mg protein per ml packed bed volume(cv), washed with 3 cv Buffer C (0.1 M), and step eluted with Buffer C(0.5 M). The 0.5 M fraction containing all the detectable dMBD-likeΔ wasfractionated over MonoQ HR5/5 (Pharmacia Biotech) in a 20 cv lineargradient from Buffer C (0.1 M) to Buffer C (1 M) and collected in 0.5 mlfractions. All fractions were analyzed by immunoblot andhistone-deacetylase assay. The fraction containing the peak ofdMBD-likeΔ was dialyzed, centrifuged, and fractionated on a Superose6HR10/30 gel filtration column (Pharmacia Biotech). All fractions wereanalyzed by immunoblotting and HDAC assay. Antisera to Mi-2 and Mta1were described elsewhere (Wade et al., 1999, supra) as were SIN3 andRPD3 antisera.

Northern Blot Analysis

RNA was prepared from staged embryos, larvae, or adult flies using theTrizol reagent (Life Technologies) according to the manufacturer'sdirections. 10 μg total RNA was loaded per lane, resolved on 1.2%agarose gels containing formaldehyde, transferred to nylon, andhybridized with a random-primed probe corresponding to the codingsequence of dMBD-likeΔ.

Cotransfection Assays

Drosophila S2 cells were grown in Schneider's medium (Sigma) at 27° C.containing 10% FBS and penicillin/streptomycin. Cells were transfectedusing the Superfect reagent (Qiagen) following the manufacturer'sdirections. Transfection assays included 1.5 μg of either thepG₅DE₅tkLuc reporter or the p-37tkRLuc internal control, essentially asdescribed in Chen et al. (1998) Mol Cell Biol 18:7259-7268. The Gal4 DNAbinding domain constructs, pACTIN-SV-Gal4-Gro, pACTIN-SV-Gal4-MBD andpACTIN-SV-Gal4-MBDA were constructed by insertion of the Gal4 DBD intothe HindIII/PstI sites of pACTIN-SV followed by insertion of Gro,dMBD-like and DMBD-likeΔ into the EcoRI site of the resulting plasmid.Quantities of individual Gal4 plasmids were varied as described in thefigure legends. The total amount of plasmid was normalized to 4 μg byaddition of pACTIN-SV vector (Huynh et al. (1999) J. Mol Biol 288:13-20)as carrier. Cells were harvested 12 h after transfection and luciferaseassays were performed using an Enhanced Luciferase Assay Kit(Pharmingen) according to the manufacturer's instructions. Whereindicated, transfected cells were treated with Trichostatin A (TSA,Wako) for 24 h before harvest.

Polytene Chromosome Staining

Polytene chromosome squashes and staining were performed on Canton-Sflies as described in Zink and Paro (1989) Nature 337:468-471 andWestwood (1991) Nature 353:822-827. Briefly, salivary glands weredissected in PBS and were placed directly in fixative containing 3.7%formaldehyde, 45% acetic acid for 1 min prior to squashing. The spreadswere stained with α-dMBD-like (1:200) followed by Alexa 594-conjugateddonkey α-goat IgG (1:400) (Molecular Probes). DNA was visualized with4,6-diamidino-2-phenylindole (DAPI; 1:1000). Control spreads stainedwith pre-immune serum, at an equivalent concentration to that indicatedabove, showed no staining. Each staining experiment was performedmultiple times.

Example 2 Identification of a Drosophila MBD Family

A search for Drosophila sequences similar to vertebrate methyl CpGbinding proteins (MBD's) yielded multiple candidates (FIG. 1A). TheDrosophila proteins are similar to vertebrate MBD proteins only in theputative methyl CpG binding domain with the exception of dMBD-like. Thesolution structure of this motif has been solved for MeCP2 (Wakefield etal. (1999) J Mol Biol 291:1055-1065) and for MBD1 (Ohki et al. (1999)EMBO J. 18:6653-6661). It consists of a wedge-shaped structure composedof four antiparallel β-strands on one face and an α-helix and hairpinloop on the other (Rubin et al. (2000) Science 287:2204-2215).

The sequences of the putative Drosophila MBD proteins were compared withthose of their vertebrate counterparts, focusing on residues critical tothe structure of the methyl CpG binding domain. Two uncharacterizedproducts of the Drosophila genome project, CG10042 and CG12196 (Adams etal. (2000) Science 287:2185-2195), and the product of the six-bandedgene (sba, Zeidler et al. (1997) Biol Chem. 378:1119-1124) contain mostof these sequence features. Specifically, the regions corresponding tothe four beta strands are well conserved, including hydrophobic residues(FIG. 1A) proposed to be crucial for integrity of the fold (Ohki et al.,supra). There is some variation in the number of amino acids betweenloop L2 and the hairpin loop, although the vertebrate MBD family membersalso differ in this aspect. Basic residues that constitute a chargedsurface on one side of the vertebrate MBD structures are also wellconserved (FIG. 1A). Finally, two hydrophobic residues critical to thestructure of the hairpin loop are also present. An important differencebetween the Drosophila and vertebrate proteins occurs in the loop L1,located between β-strands two and three (FIG. 1A). In vertebrate methylCpG binding proteins, the spacing between strands β2 and β3 is invariantand the amino acid side chains in this loop are very similar. Incontrast, the Drosophila proteins have variations both in length and inside chain chemistry in this loop (FIG. 1A). This region of MBD1 andMeCP2 undergoes a conformational change upon binding to methylated DNAand is implicated as crucial for protein-DNA interaction. It seemslikely that alterations in this region of the protein will abolishmethyl CpG binding activity.

Drosophila MBD-like (dMBD-like) was previously identified as a sequencerelative of vertebrate MBD2 and MBD3 (Zhang et al. (1999) Genes Dev13:1924-1935; Tweedie et al. (1999) Nature Genet 23:329-390). It issimilar to MBD2 and MBD3 throughout its length (FIG. 1B) and is encodedby a single gene (Flybase ID FBgn0027950, 1). Two mRNAs are generatedfrom this locus, one of 1115 bases, a second of 842 bases (Rubin et al.(2000) Science 287:2222-2224). The protein products of thesealternatively spliced mRNAs differ in the amino acids encoded by exon 2(FIG. 1B).

The two methyl CpG binding domain protein homologs were named dMBD-like(product of the 1115 base mRNA) and dMBD-likeΔ (product of the 842 basemRNA). Both dMBD-like isoforms share extensive sequence similarity withthe recently described forms of MBD3 from Xenopus (Wade et al. (1999)Nature Genet 23:62-66), particularly the third exon of the Drosophilagene (FIG. 1B). However, there are several gaps and non-conserved aminoacids in the region corresponding to the MBD (FIG. 1B). The twodMBD-like proteins have an opa-like repeat (Wharton et al. (1985) Cell40:55-62) inserted in the loop between strands β2 and β3 (FIG. 1B)—thisregion is predicted to interact with DNA (Wakefield et al. (1999) J MolBiol 291:1055-1065). In addition, dMBD-like lacks the distal portion ofthe α-helix making up one face of the wedge (FIG. 1B). The shorterisoform, dMBD-likeΔ, completely lacks the fourth β-strand, the α-helix,and the hairpin loop. Finally, there are numerous amino acid changes atpositions predicted to be crucial for DNA interaction and structuralintegrity of the domain. Thus, it seemed unlikely that either proteinwould bind methylated DNA.

Example 3 dMBD-Like Fails to Bind Methylated DNA

The two dMBD-like proteins were expressed in bacteria, purified, andtheir binding properties were compared to Xenopus MBD3, a proteinpreviously demonstrated to bind selectively to methylated DNA (Wade etal. (1999), supra). In Southwestern assays using immobilized protein,neither isoform interacted with the probes, regardless of methylationstatus (FIG. 1C). Thus, either the proteins fail to bind or they areunable to refold on the membrane surface.

Solution interactions with DNA were also examined using anelectrophoretic mobility shift assay (FIG. 1D). Neither of theDrosophila proteins bound, under conditions where Xenopus MBD3 boundmethylated DNA selectively. In fact, no binding was observed for thedMBD-like isoforms even after reducing the concentrations of coldcompetitor DNA up to 50 fold, resulting in a mass excess ofradiolabelled probe over cold competitor (see Example 1). No interactionwas detected under these conditions, even non-specific aggregation.Therefore, the results indicate that neither dMBD-like isoform bindsDNA, in keeping with the lack of genome-wide methylation in Drosophila.

Example 4 dMBD-LikeΔ Associates with HDAC and Nucleosome-StimulatedATPase Activities

The sequence similarity between exon three of dMBD-like and vertebrateMBD2 and MBD3 (Yao et al. (1993) Nature 366:476-479) impliesconservation of function. One potential role for this region isinteraction with other proteins. As Drosophila contains homologs of manyproteins known to be components of HDAC-containing corepressor complexesin other systems (Rubin et al. (2000) Science 287:2204-2215 and Adams etal. (2000) Science 287:2185-2195), the presence of dMBD-like as acomponent of such a complex (or complexes) in Drosophila was examined.

α-dMBD-like polyclonal antisera were used to investigate potentialinteractions between dMBD-like and known components of corepressorcomplexes. Immunoblot analysis confirmed that the antisera recognizedboth isoforms of dMBD-like (FIG. 2A). Interestingly, only the shorterisoform, dMBD-likeΔ, was detected in nuclear extracts from S2 cells(FIG. 2A). Immunoprecipitations were then performed from S2 nuclearextracts and the precipitated proteins were assayed for enzymaticactivities associated with corepressor complexes. Immune serum, but notpreimmune serum, efficiently precipitated histone deacetylase activity(FIG. 2B) and also precipitated ATPase activity (FIG. 2C). Likevertebrate and Drosophila Mi-2 (Chen et al. (1999) Genes Devel.13:2218-2230; Guschin et al. (2000) Biochemistry 39:5238-5245) theprecipitated ATPase activity was stimulated by nucleosomes (FIG. 2C).These results indicate that MBD-likeΔ is associated with an undefinedhistone deacetylase and a nucleosome-stimulated ATPase in S2 nuclei,suggesting inclusion of dMBD-likeΔ in a Drosophila Mi-2-like corepressorcomplex.

The relationship of dMBD-likeΔ with other proteins in S2 cells usingclassical biochemical techniques was also examined. Nuclear extractswere fractionated using ion exchange and gel filtration chromatography;dMBD-likeΔ was assayed in the fractions by immunoblot. All thedetectable dMBD-likeΔ bound the BioRex 70 column and was eluted at 0.5 MNaCl. Gradient elution of the BioRex 70 pool on MonoQ yielded a majorpeak of deacetylase activity, precisely coeluting with SIN3 and RPD3(FIG. 3A). The peak of dMBD-likeΔ by immunoblot was resolved from thepeaks of HDAC activity, RPD3, and SIN3 by a single fraction (FIG. 3A).The peak fraction of dMBD-likeΔ from the MonoQ column was furtherpurified using a Superose 6 gel filtration column. The peaks ofdMBD-likeΔ, Mi-2, and MTA1-like (Wade et al. (1999) Nature Genet.23:62-66) coeluted at a position consistent with a molecular mass ofapproximately 1 MDa (FIG. 3B). SIN3, RPD3, and p55 (Zeidler et al.,supra; Martinez-Balbas et al. (1998) Proc. Natl. Acad. Sci. USA95:132-137), the Drosophila RbA p48/p46 homolog, also closely, but notprecisely, coeluted. Thus, dMBD-likeΔ copurifies with Mi-2 andMTA1-like, consistent with its inclusion in a Drosophila Mi-2 complexsimilar to that observed in vertebrates.

Example 5 dMBD-Like Represses Transcription when Tethered Near aPromoter

A transcription assay, essentially as described in Chen et al. (1999)Genes Devel 13:2218-2230 and Chen et al. (1998) Mol Cell Biol18:7259-7268, was used to assess the consequences of tethering thedMBD-like isoforms at a promoter. Both dMBD-like and dMBD-likeΔ werefused to the Gal4 DNA binding domain (DBD). A Gal4-Groucho fusion (see,Chen et al (1998) and Chen et al. (1999), both supra) and the Gal4 DBDwere used as positive and negative controls for transcriptionalrepression, respectively (FIG. 4A). Gal4 fusions to dMBD-like,dMBD-likeΔ and Groucho mediated dose-dependent transcriptionalrepression following transfection into S2 cells (FIG. 4B). Controlexperiments showed that repression required a Gal4 site in the reporterplasmid (FIG. 4B, lower) and that transfection of dMBD-like ordMBD-likeΔ lacking the Gal4 DBD failed to repress transcription. Theexpression levels of the transfected Gal4 fusion proteins wereequivalent by immunoblot (FIG. 4C). Further, the repression seen withthe two tethered dMBD-like isoforms was similar to that observed withthe well-characterized repressor Groucho (FIG. 4B).

If dMBD-like represses transcription through recruitment of a DrosophilaMi-2 complex, transcriptional repression should be sensitive toinhibitors of histone deacetylases. Accordingly, the effect ofTrichostatin-A, a histone deacetylase inhibitor, on dMBD-like-mediatedrepression was tested. Repression mediated by tethering of dMBD-like ordMBD-likeΔ was largely relieved by Trichostatin-A (TSA); this effect wasqualitatively very similar to that of TSA on Groucho-mediated repression(FIG. 4D).

In sum, the results indicate that both isoforms of dMBD-like function astranscriptional corepressors through recruitment of histone deacetylaseactivity, consistent with the proposed function of the Drosophila Mi-2complex.

Example 6 Developmental Expression Profile of dMBD-Like

Northern analysis was used to ascertain the expression patterns of thetwo dMBD-like isoforms during Drosophila development (FIG. 5). Twotranscripts, corresponding to the splice variants of dMBD-like, werepresent in early embryos (FIG. 5A). Interestingly, the 0-2 hour embryoshad only the longer mRNA. Levels of this mRNA decline precipitouslyafter 12 hours of embryonic development and remain undetectable inlarval stages and adult males; however this mRNA is present in adultfemales, possibly due to maternal mRNA in the ovary. Protein expressionpatterns partially mimic the mRNA expression data (FIG. 5B). Again, onlydMBD-like was observed in 0-2 hour embryos while both dMBD-like anddMBD-likeΔ were present in 12-24 hour embryos. In the final embryonicstage and in the first larval stage, only the dMBD-likeΔ isoform waspresent. Neither protein isoform was detectable in the remaining larvalstages or in adults, despite the presence of their mRNAs (FIG. 5A).

Example 7 dMBD-Like Associates with Heterochromatin and a Small Numberof Euchromatic Sites in Polytene Chromosomes

To investigate target gene specificity of dMBD-like, its distribution onDrosophila salivary gland polytene chromosomes from third instar larvaewas examined. Polytene chromosomes are ideal for this analysis sincethey are thought to reflect the biochemical and structural properties ofchromatin of diploid interphase cells and their large size allows forthe identification of individual chromosomal sites (Hill et al. (1987)Int Rev Cytol 108:61-118). At this developmental stage, only the shortermRNA was detected, corresponding to dMBD-likeΔ (see Example 6).

To visualize the banding pattern of the polytene chromosomes, thechromosomes were counterstained with DAPI, which stains brightest in thecondensed, banded regions of euchromatin and the constitutivelycondensed heterochromatin at the chromocenter. Immunofluorescencestaining with the antibody to dMBD-like revealed preferentialassociation with 29 euchromatic sites as well as weaker association with˜100 euchromatic sites and centric heterochromatin. In addition tolocalization to a set of discrete sites within the euchromaticchromosome arms, dMBD-likeΔ is present at the chromocenter, a region ofconstitutive heterochromatin. No staining was observed with preimmunesera or at the hunchback-regulated Ubx locus. Interestingly, 69% of thepredominant sites correspond to developmental puffs that aretranscriptionally induced by pulses of the steroid hormone20-hydroxyecdysone (ecdysone) during the late larval and prepupalperiods (Ashburner et al. (1972) Results Probl Cell Differ 4:101-151).

These binding studies indicate that dMBD-likeΔ is a component of an Mi-2complex in flies and not a component of a SIN3 containing complex.Further, the Mi-2 complex in flies appears to have a role in eitherestablishment or maintenance of heterochromatin.

Although disclosure has been provided in some detail by way ofillustration and example for the purposes of clarity of understanding,it will be apparent to those skilled in the art that various changes andmodifications can be practiced without departing from the spirit orscope of the disclosure. Accordingly, the foregoing descriptions andexamples should not be construed as limiting.

1. A polynucleotide encoding a fusion polypeptide, wherein thepolypeptide comprises: a) a localization domain, wherein thelocalization domain is a chromodomain or is a localization domain from aprotein selected from the group consisting of methylated DNA-bindingproteins (MBDs), DNA-N-methyl transferases (dNMTs) and MBD-likeproteins; and b) a DNA binding domain that is heterologous to thelocalization domain, or functional fragment thereof.
 2. Thepolynucleotide of claim 1, wherein the DNA-binding domain is a zincfinger DNA binding domain.
 3. The polynucleotide of claim 1, wherein thelocalization domain is a methyl CpG binding domain.
 4. Thepolynucleotide of claim 3, wherein the methyl CpG binding domain is froma gene involved in a disease state selected from the group consisting ofICF syndrome, Rett syndrome and Fragile X syndrome.
 5. Thepolynucleotide of claim 1, wherein the protein from which thelocalization domain is obtained is selected from the group consisting ofdMBD-like and dMBD-likeΔ.
 6. The polynucleotide of claim 1, wherein thefusion polypeptide further comprises: c) a regulatory domain orfunctional fragment thereof.
 7. The polynucleotide of claim 6, whereinthe regulatory domain is an activation domain.
 8. The polynucleotide ofclaim 7, wherein the activation domain comprises VP-16, p65 orfunctional fragments thereof.
 9. The polynucleotide of claim 6, whereinthe regulatory domain is a repression domain.
 10. A cell comprising thepolynucleotide of claim
 1. 11. The cell of claim 10, wherein the cell isa plant cell.
 12. The cell of claim 10, wherein the cell is an animalcell.